https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106678
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> --- The inner loop for aarch64 on the trunk is: .L5: ldr x7, [x20, x5, lsl 3] ldr x10, [x21, x12, lsl 3] ldr x6, [x11, x5, lsl 3] mul x2, x7, x10 umulh x7, x7, x10 adds x2, x2, x8 cinc x8, x7, cs adds x2, x2, x6 cset x7, cs adds x2, x2, x9 add x6, x6, x2 str x6, [x11, x5, lsl 3] add x5, x5, 1 cinc x9, x7, cs cmp x19, x5 bne .L5 So I suspect this is still a target issue.