https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95019

--- Comment #2 from zhongyunde at tom dot com <zhongyunde at tom dot com> ---
It is a generic issue for all targets, such as x86, it also don't enpand IVOPTs
as index is not used for DEST and Src directly. we may need expand IVOPTs, then
different targets can select different one according their Cost model.
Now, it seems ok for x86 as it have load/store insns folded the lshift operand,
so it doesn't need separate lshift operand in loop body .

========== base on the ARM gcc 9.2.1 on https://gcc.godbolt.org, You'll get
separate lshift operand lsl in loop kernel, and ARM64 gcc 8.2 will use ldr    
x3, [x1, x4, lsl 3] to avoid the separate lshift operand. so we can see all
target dont select an IV with Step 8. 
C00000ADA(unsigned long long, long long*, long long*):
        push    {r4, r5, r6, r7, lr}    @
        mov     r4, r0    @ len, tmp135
        mov     r5, r1    @ len, tmp136
        orrs    r1, r4, r5      @ tmp137, len
        beq     .L1             @,
        mov     r1, #0    @ C000005A1,
.L3:
        lsl     r0, r1, #3        @ _2, C000005A1,
        add     ip, r2, r1, lsl #3        @ tmp120, Src, C000005A1,
        ldr     lr, [r2, r0]      @ _4, *_3
        ldr     ip, [ip, #4]      @ _4, *_3
        umull   r6, r7, lr, lr        @ tmp125, _4, _4
        mul     ip, lr, ip        @ tmp122, _4, tmp122
        adds    r1, r1, r4      @ C000005A1, C000005A1, len
        subs    r4, r4, #1      @ len, len,
        sbc     r5, r5, #0        @ len, len,
        add     r0, r3, r0        @ tmp121, Dest, _2
        add     r7, r7, ip, lsl #1        @,, tmp122,
        orrs    lr, r4, r5      @ tmp138, len
        stm     r0, {r6-r7}       @ *_5, tmp125
        bne     .L3             @,
.L1:
        pop     {r4, r5, r6, r7, lr}      @
        bx      lr  @

Thanks for your notice.

Reply via email to