https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95019
--- Comment #2 from zhongyunde at tom dot com <zhongyunde at tom dot com> --- It is a generic issue for all targets, such as x86, it also don't enpand IVOPTs as index is not used for DEST and Src directly. we may need expand IVOPTs, then different targets can select different one according their Cost model. Now, it seems ok for x86 as it have load/store insns folded the lshift operand, so it doesn't need separate lshift operand in loop body . ========== base on the ARM gcc 9.2.1 on https://gcc.godbolt.org, You'll get separate lshift operand lsl in loop kernel, and ARM64 gcc 8.2 will use ldr x3, [x1, x4, lsl 3] to avoid the separate lshift operand. so we can see all target dont select an IV with Step 8. C00000ADA(unsigned long long, long long*, long long*): push {r4, r5, r6, r7, lr} @ mov r4, r0 @ len, tmp135 mov r5, r1 @ len, tmp136 orrs r1, r4, r5 @ tmp137, len beq .L1 @, mov r1, #0 @ C000005A1, .L3: lsl r0, r1, #3 @ _2, C000005A1, add ip, r2, r1, lsl #3 @ tmp120, Src, C000005A1, ldr lr, [r2, r0] @ _4, *_3 ldr ip, [ip, #4] @ _4, *_3 umull r6, r7, lr, lr @ tmp125, _4, _4 mul ip, lr, ip @ tmp122, _4, tmp122 adds r1, r1, r4 @ C000005A1, C000005A1, len subs r4, r4, #1 @ len, len, sbc r5, r5, #0 @ len, len, add r0, r3, r0 @ tmp121, Dest, _2 add r7, r7, ip, lsl #1 @,, tmp122, orrs lr, r4, r5 @ tmp138, len stm r0, {r6-r7} @ *_5, tmp125 bne .L3 @, .L1: pop {r4, r5, r6, r7, lr} @ bx lr @ Thanks for your notice.