Hello!
I maintain a fork of GCC which adds support for my custom CPU ISA,
MRISC32 (the machine description can be found here:
https://github.com/mrisc32/gcc-mrisc32/tree/mbitsnbites/mrisc32/gcc/config/mrisc32
).
I recently discovered that scaled index addressing (i.e. MEM[base +
index * scale]) does not work inside loops, but I have not been able to
figure out why.
I believe that I have all the plumbing in the MD that's required
(MAX_REGS_PER_ADDRESS, REGNO_OK_FOR_BASE_P, REGNO_OK_FOR_INDEX_P, etc),
and I have verified that scaled index addressing is used in trivial
cases like this:
charcarray[100];
shortsarray[100];
intiarray[100];
voidsingle_element(intidx, intvalue) {
carray[idx] = value; // OK
sarray[idx] = value; // OK
iarray[idx] = value; // OK
}
...which produces the expected machine code similar to this:
stbr2, [r3, r1] // OK
sthr2, [r3, r1*2] // OK
stwr2, [r3, r1*4] // OK
However, when the array assignment happens inside a loop, only the char
version uses index addressing. The other sizes (short and int) will be
transformed into code where the addresses are stored in registers that
are incremented by +2 and +4 respectively.
voidloop(void) {
for(intidx = 0; idx < 100; ++idx) {
carray[idx] = idx; // OK
sarray[idx] = idx; // BAD
iarray[idx] = idx; // BAD
}
} ...which produces:
.L4:
sthr1, [r3] // BAD
stwr1, [r2] // BAD
stbr1, [r5, r1] // OK
addr1, r1, #1
sner4, r1, #100
addr3, r3, #2 // (BAD)
addr2, r2, #4 // (BAD)
bsr4, .L4
I would expect scaled index addressing to be used in loops too, just as
is done for AArch64 for instance. I have dug around in the machine
description, but I can't really figure out what's wrong.
For reference, here is the same code in Compiler Explorer, including the
code generated for AArch64 for comparison: https://godbolt.org/z/drzfjsxf7
Passing -da (dump RTL all) to gcc, I can see that the decision to not
use index addressing has been made already in *.253r.expand.
Does anyone have any hints about what could be wrong and where I should
start looking?
Regards,
Marcus