Hi Zdenek,
I have a patch set for for PR45098.
01_object-size-target.patch
02_pr45098-rtx-cost-set.patch
03_pr45098-computation-cost.patch
04_pr45098-iv-init-cost.patch
05_pr45098-bound-cost.patch
06_pr45098-bound-cost.test.patch
07_pr45098-nowrap-limits-iterations.patch
08_pr45098-nowrap-limits-iterations.test.patch
09_pr45098-shift-add-cost.patch
10_pr45098-shift-add-cost.test.patch
I will sent out the patches individually.
The patch set has been bootstrapped and reg-tested on x86_64, and
reg-tested on ARM.
The effect of the patch set on examples is the removal of 1 iterator,
demonstrated below for '-Os -mthumb -march=armv7-a' on example tr4.
tr4.c:
...
extern void foo2 (short*);
void tr4 (short array[], int n)
{
int i;
if (n > 0)
for (i = 0; i < n; i++)
foo2 (&array[i]);
}
...
tr4.s diff (left without, right with patch):
...
push {r4, r5, r6, lr} | cmp r1, #0
subs r6, r1, #0 | push {r3, r4, r5, lr}
ble .L1 ble .L1
mov r5, r0 | mov r4, r0
movs r4, #0 | add r5, r0, r1, lsl #1
.L3: .L3:
mov r0, r5 | mov r0, r4
adds r4, r4, #1 | adds r4, r4, #2
bl foo2 bl foo2
adds r5, r5, #2 | cmp r4, r5
cmp r4, r6 <
bne .L3 bne .L3
.L1: .L1:
pop {r4, r5, r6, pc} | pop {r3, r4, r5, pc}
...
The effect of the patch set on the test cases in terms of size is listed
in the following 2 tables.
---------------------------
-Os -thumb -mmarch=armv7-a
---------------------------
without with delta
---------------------------
tr1 32 30 -2
tr2 36 36 0
tr3 32 30 -2
tr4 26 26 0
tr5 20 20 0
---------------------------
---------------------------
-Os -mmarch=armv7-a
---------------------------
without with delta
---------------------------
tr1 60 52 -8
tr2 64 60 -4
tr3 60 52 -8
tr4 48 44 -4
tr5 36 32 -4
---------------------------
The size impact on several benchmarks is shown in the following table
(%, lower is better).
none pic
thumb1 thumb2 thumb1 thumb2
spec2000 99.9 99.9 99.9 99.9
eembc 99.9 100.0 99.9 100.1
dhrystone 100.0 100.0 100.0 100.0
coremark 99.3 99.9 99.3 100.0
Thanks,
- Tom