https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827
--- Comment #1 from Robin Dapp <rdapp at gcc dot gnu.org> --- x86 (-march=native -O3 on an i7 12th gen) looks pretty similar: .L3: movq (%rdi), %rax vmovups (%rax), %xmm1 vdivps %xmm0, %xmm1, %xmm1 vmovups %xmm1, (%rax) addq $16, %rax movq %rax, (%rdi) addq $8, %rdi cmpq %rdi, %rdx jne .L3 So probably not target specific. Costing?