On 07 Jul 2015, at 13:52, Bin.Cheng <[email protected]> wrote:
> On Tue, Jul 7, 2015 at 10:05 AM, Anmol Paralkar (anmparal)
> <[email protected]> wrote:
>> Hello,
>>
>> Does GCC generate LDRD/STRD (Register) forms [A8.8.74/A8.8.211 per ARMv7-A
>> & ARMv7-R ARM]?
>>
>> Based on various attempts to write code to get GCC to generate a sample
>> form, and subsequently inspecting the code I see in
>> config/arm/arm.c/output_move_double () & arm.md [GCC 4.9.2], I think that
>> these register based forms of LDRD/STRD are
>> not generated, but I thought it might be a good idea to ask on the list,
>> just in case.
> Register based LDRD is harder than immediate version. ARM doesn't
> support [base + reg + offset] addressing mode, so address computation
> of the second memory reference is scattered both in and out of memory
> reference. To identify such opportunities, one needs to trace
> registers in address expression the memory access instruction and does
> some kind of value computation and re-association.
Basically, this is what we're trying to do with AMS. For each mem access it
tries to trace the reg values and figure out the effective address expression.
For now we've limited it to the form 'base_reg + index_reg*scale +
const_displacement'. Then we try to see how to fit the address expressions to
the available address modes.
It's still work in progress but already shows some improvements.
A classic SH4 example:
float fun (float* x)
{
return x[0] + x[1] + x[2] + x[3];
}
no AMS:
mov r4,r1
add #4,r1
fmov.s @r4,fr0
fmov.s @r1,fr1
mov r4,r1
add #8,r1
fadd fr1,fr0
fmov.s @r1,fr1
add #12,r4
fadd fr1,fr0
fmov.s @r4,fr1
rts
fadd fr1,fr0
AMS:
fmov.s @r4+,fr0
fmov.s @r4+,fr1
fadd fr1,fr0
fmov.s @r4+,fr1
fadd fr1,fr0
fmov.s @r4,fr1
rts
fadd fr1,fr0
If I understand correctly, ARM's LDRD/STRD are similar to SH's FPU 2x32 pair
loads/stores. It needs the mem access insns of adjacent addresses to be
adjacent in the insn stream. We'll try to do some mem access reordering in
AMS, mainly to improve post/pre inc/dec address mode utilization. Afterwards,
adjacent mem accesses can be fused together in a separate RTL pass or AMS
sub-pass to avoid re-discovering mem access sequence information, which AMS
already has.
Cheers,
Oleg