https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503
--- Comment #10 from Wilco <wdijkstr at arm dot com> --- The loops shown are not the correct inner loops for those options - with -ffast-math they are vectorized. LLVM unrolls 2x but GCC doesn't. So the question is why GCC doesn't unroll vectorized loops like LLVM? GCC: .L24: ldr q3, [x13, x5] add x6, x6, 1 ldr q2, [x16, x5] cmp x6, x12 add x5, x5, 16 fmla v1.2d, v3.2d, v2.2d bcc .L24 LLVM: .LBB2_12: ldur q2, [x8, #-16] ldr q3, [x8], #32 ldur q4, [x21, #-16] ldr q5, [x21], #32 fmla v1.2d, v2.2d, v4.2d fmla v0.2d, v3.2d, v5.2d sub x30, x30, #4 // =4 cbnz x30, .LBB2_12