[PATCH, PR45098]

Tom de Vries Tue, 17 May 2011 00:11:41 -0700

Hi Zdenek,

I have a patch set for for PR45098.


01_object-size-target.patch
02_pr45098-rtx-cost-set.patch
03_pr45098-computation-cost.patch
04_pr45098-iv-init-cost.patch
05_pr45098-bound-cost.patch
06_pr45098-bound-cost.test.patch
07_pr45098-nowrap-limits-iterations.patch
08_pr45098-nowrap-limits-iterations.test.patch
09_pr45098-shift-add-cost.patch
10_pr45098-shift-add-cost.test.patch

I will sent out the patches individually.

The patch set has been bootstrapped and reg-tested on x86_64, and
reg-tested on ARM.

The effect of the patch set on examples is the removal of 1 iterator,
demonstrated below for '-Os -mthumb -march=armv7-a' on example tr4.

tr4.c:
...
extern void foo2 (short*);
void tr4 (short array[], int n)
{
  int i;
  if (n > 0)
    for (i = 0; i < n; i++)
      foo2 (&array[i]);
}
...

tr4.s diff (left without, right with patch):
...
push    {r4, r5, r6, lr}          |     cmp     r1, #0
subs    r6, r1, #0                |     push    {r3, r4, r5, lr}
ble     .L1                             ble     .L1
mov     r5, r0                    |     mov     r4, r0
movs    r4, #0                    |     add     r5, r0, r1, lsl #1
.L3:                                    .L3:
mov     r0, r5                    |     mov     r0, r4
adds    r4, r4, #1                |     adds    r4, r4, #2
bl      foo2                            bl      foo2
adds    r5, r5, #2                |     cmp     r4, r5
cmp     r4, r6                    <
bne     .L3                             bne     .L3
.L1:                                    .L1:
pop     {r4, r5, r6, pc}          |     pop     {r3, r4, r5, pc}
...


The effect of the patch set on the test cases in terms of size is listed
in the following 2 tables.

---------------------------
-Os -thumb -mmarch=armv7-a
---------------------------
    without    with   delta
---------------------------
tr1      32      30      -2
tr2      36      36       0
tr3      32      30      -2
tr4      26      26       0
tr5      20      20       0
---------------------------

---------------------------
-Os -mmarch=armv7-a
---------------------------
    without    with   delta
---------------------------
tr1      60      52      -8
tr2      64      60      -4
tr3      60      52      -8
tr4      48      44      -4
tr5      36      32      -4
---------------------------


The size impact on several benchmarks is shown in the following table
(%, lower is better).

                     none            pic
                thumb1  thumb2  thumb1 thumb2
spec2000          99.9    99.9    99.9   99.9
eembc             99.9   100.0    99.9  100.1
dhrystone        100.0   100.0   100.0  100.0
coremark          99.3    99.9    99.3  100.0

Thanks,
- Tom

[PATCH, PR45098]

Reply via email to