https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123524

--- Comment #9 from mikulas at artax dot karlin.mff.cuni.cz ---
I bisected the other problem (not using the scaled addressing modes) - and it
is caused by the commit 24bc02b1eda3795163616c02725ee14bac9d975c ("gimple-fold:
Remove assume_aligned folding").

The source code uses __builtin_assume_aligned when accessing the variables on
the frame:
#define frame_var(fp, idx)              (cast_ptr(unsigned char *,
__builtin_assume_aligned(frame_char_(fp) + ((size_t)(idx) << slot_bits),
slot_size)))

When I delete __builtin_assume_aligned from the source code, the scaled
addressing is properly generated.


BTW. gcc-16 on arm64 also generates slightly worse code due to
__builtin_assume_aligned.

With __builtin_assume_aligned:
    fe6c:       b8402260        ldur    w0, [x19, #2]
    fe70:       b8406261        ldur    w1, [x19, #6]
    fe74:       b840a262        ldur    w2, [x19, #10]
    fe78:       38606a83        ldrb    w3, [x20, x0]
    fe7c:       38616a84        ldrb    w4, [x20, x1]
    fe80:       2b04007f        cmn     w3, w4
    fe84:       54ff29e1        b.ne    e3c0 <u_run+0x9aa0>  // b.any
    fe88:       8b224e82        add     x2, x20, w2, uxtw #3
    fe8c:       f8607a80        ldr     x0, [x20, x0, lsl #3]
    fe90:       f8617a81        ldr     x1, [x20, x1, lsl #3]
    fe94:       ab010000        adds    x0, x0, x1
    fe98:       54ff2946        b.vs    e3c0 <u_run+0x9aa0>
    fe9c:       f9000040        str     x0, [x2]
    fea0:       78412e61        ldrh    w1, [x19, #18]!
    fea4:       90000000        adrp    x0, 0 <FIXED_binary_divide_int8_t>
    fea8:       91000000        add     x0, x0, #0x0
    feac:       f861d800        ldr     x0, [x0, w1, sxtw #3]
    feb0:       d61f0000        br      x0
Without __builtin_assume_aligned:
    fd30:       b8402260        ldur    w0, [x19, #2]
    fd34:       b8406261        ldur    w1, [x19, #6]
    fd38:       b840a262        ldur    w2, [x19, #10]
    fd3c:       38606a83        ldrb    w3, [x20, x0]
    fd40:       38616a84        ldrb    w4, [x20, x1]
    fd44:       2b04007f        cmn     w3, w4
    fd48:       54ff29e1        b.ne    e284 <u_run+0x9964>  // b.any
    fd4c:       f8607a80        ldr     x0, [x20, x0, lsl #3]
    fd50:       f8617a81        ldr     x1, [x20, x1, lsl #3]
    fd54:       ab010000        adds    x0, x0, x1
    fd58:       54ff2966        b.vs    e284 <u_run+0x9964>
    fd5c:       f8225a80        str     x0, [x20, w2, uxtw #3]
    fd60:       78412e61        ldrh    w1, [x19, #18]!
    fd64:       90000000        adrp    x0, 0 <FIXED_binary_divide_int8_t>
    fd68:       91000000        add     x0, x0, #0x0
    fd6c:       f861d800        ldr     x0, [x0, w1, sxtw #3]
    fd70:       d61f0000        br      x0

Reply via email to