Hello,
We recently enabled LDMIA/STMIA instructions for Thumb-1 (Cortex-M0+) by
modifying ARM_AUTOINC_VALID_FOR_MODE_P to allow auto-increment addressing
for THUMB1 targets. However, we've discovered that IVOPTs generates
suboptimal code for simple loops due to incorrect addressing mode selection.
Consider this test case:
void test(int *a, int *b, int size)
{
for (int i = 0; i < size; i++)
{
a[i] = b[i] * a[i];
}
}
GCC currently generates:
ldmia r0!, {r4}
ldmia r1!, {r6}
subs r5, r0, #4
...
str r4, [r5, #0]
The issue occurs because IVOPTs selects a candidate with the lowest cost that
has the following structure:
Candidate xxx:
Incr POS: after use 0
IV struct:
Type: unsigned int
Base: (unsigned int) a_13(D)
Step: 4
This results in the following loop structure:
loop-preheader:
r0 = a
jump loop-exiting
loop-header:
load-from [r0]
increment r0
store-to [r0, #-4]
loop-exiting:
jump loop-header
**Issue 1:** IVOPTs recognizes both patterns as valid post-increment with
offset zero:
- "load-from [r0]; increment r0" → recognized as post-inc from offset 0
- "increment r0; store-to [r0, #-4]" → also recognized as post-inc from
offset 0
The code in tree-ssa-loop-ivopts.cc:get_address_cost() applies the adjustment:
if (stmt_after_increment (data->current_loop, cand, use->stmt))
ainc_offset += ainc_step;
cost = get_address_cost_ainc (ainc_step, ainc_offset,
addr_mode, mem_mode, as, speed);
However, Thumb-1 does not support negative immediate offsets in addressing
modes. The pattern "increment r0; store-to [r0, #-4]" can never be realized
as a post-increment store on Thumb-1, yet IVOPTs assigns it a low cost.
**Question 1:** Should get_address_cost() verify that an addressing mode is
actually valid on the target before assigning auto-increment cost? Currently,
it appears to assume validity without checking target constraints.
**Issue 2:** IVOPTs also assigns low cost to another candidate:
Candidate yyy:
Incr POS: before exit test
IV struct:
Type: unsigned int
Base: (unsigned int) a_13(D)
Step: 4
This produces:
loop-preheader:
r0 = &a[0]
jump loop-exiting
loop-header:
load-from [r0, #-4]
store-to [r0]
loop-exiting:
increment r0
jump loop-header
IVOPTs considers that the increment in the loop-exiting block can be paired
with "load-from [r0, #-4]" in the loop-header block, despite them being in
different basic blocks.
**Question 2:** Should get_address_cost() verify that the candidate increment
and use->stmt are in the same basic block when cand->pos == IP_NORMAL?
Cross-block pairing seems problematic for post-increment addressing mode
costing.
Both issues suggest that IVOPTs may need additional validation to ensure:
1. The selected addressing mode is actually supported by the target
2. The increment and memory operation are properly co-located for IP_NORMAL
candidates
Best regards,
Ciprian Arbone