Hi Siddhesh,

I still don't like the idea of disabling a whole class of instructions in the 
md file.
It seems much better to adjust the costs here so that you get most of the
improvement now, and fine tune it once we can differentiate between
loads and stores.

Taking your example, adding -funroll-loops generates this for Falkor:

        ldr     q7, [x2, x18]
        add     x5, x18, 16
        add     x4, x1, x18
        add     x10, x18, 32
        add     x11, x1, x5
        add     x3, x18, 48
        add     x12, x1, x10
        add     x9, x18, 64
        add     x14, x1, x3
        add     x8, x18, 80
        add     x15, x1, x9
        add     x7, x18, 96
        add     x16, x1, x8
        str     q7, [x4]
        ldr     q16, [x2, x5]
        add     x6, x18, 112
        add     x17, x1, x7
        add     x18, x18, 128
        add     x5, x1, x6
        cmp     x18, x13
        str     q16, [x11]
        ldr     q17, [x2, x10]
        str     q17, [x12]
        ldr     q18, [x2, x3]
        str     q18, [x14]
        ldr     q19, [x2, x9]
        str     q19, [x15]
        ldr     q20, [x2, x8]
        str     q20, [x16]
        ldr     q21, [x2, x7]
        str     q21, [x17]
        ldr     q22, [x2, x6]
        str     q22, [x5]
        bne     .L25

If you adjust costs however you'd get this:

.L25:
        ldr     q7, [x14]
        add     x14, x14, 128
        add     x4, x4, 128
        str     q7, [x4, -128]
        ldr     q16, [x14, -112]
        str     q16, [x4, -112]
        ldr     q17, [x14, -96]
        str     q17, [x4, -96]
        ldr     q18, [x14, -80]
        str     q18, [x4, -80]
        ldr     q19, [x14, -64]
        str     q19, [x4, -64]
        ldr     q20, [x14, -48]
        str     q20, [x4, -48]
        ldr     q21, [x14, -32]
        str     q21, [x4, -32]
        ldr     q22, [x14, -16]
        cmp     x14, x9
        str     q22, [x4, -16]
        bne     .L25

So it seems to me using existing cost mechanisms is always preferable, even if 
you
currently can't differentiate between loads and stores.

Wilco

Reply via email to