Hi,
Based on the previous discussion of v2, I'm reposting this patch with
suggested changes for complexity and adding an additional MIPS-specific
assembly test.
The assembly test for this patch scans pattern in the innermost
loop(.L6), where the lwc1 instruction for $f1 must come first, followed
by the lwc1 for $f0. In addition, less computation is required to
perform the lw instructions compared to the case before the patch, which
results in reduced code size. Before the patch the non-optimal set for
iv candidates is chosen.
===== Before the patch =====
.L6:
daddu $3,$9,$2
lwc1 $f0,0($2)
daddu $3,$3,$8
daddiu $2,$2,4
lwc1 $f1,0($3)
addiu $7,$7,1
maddf.s $f0,$f2,$f1
swc1 $f0,-4($2)
bltc $7,$10,.L6
addiu $11,$11,1
bne $6,$11,.L7
daddu $8,$8,$14
===== Before the patch =====
===== After the patch =====
.L6:
lwc1 $f1,0($7)
daddiu $2,$2,4
lwc1 $f0,-4($2)
addiu $3,$3,1
daddiu $7,$7,4
maddf.s $f0,$f2,$f1
swc1 $f0,-4($2)
bltc $3,$8,.L6
addiu $10,$10,1
bne $6,$10,.L7
daddu $9,$9,$13
===== After the patch =====
Before the commit f9f69dd, the complexity was incremented when
aff_inv.offset was not equal to 0. In this patch, if aff_inv is not
constant, we want to check the base + offset addressing mode even if
base + index is not supported. For example, the MIPS architecture does
not support base + index but does support base + offset, which should
therefore be checked.
Looking at the GIMPLE of the test case before and after the patch, we
see Group 0 and Group 1 of type REFERENCE ADDRESSES. The group costs
before and after the patch are shown below:
===== Before the patch =====
<Group-candidate Costs>:
Group 0:
cand cost compl. inv.expr. inv.vars
1 22 0 1; NIL;
2 22 0 2; NIL;
4 2 0 NIL; NIL;
5 2 2 NIL; NIL;
6 16 0 3; NIL;
7 18 0 4; NIL;
9 14 0 1; NIL;
Group 1:
cand cost compl. inv.expr. inv.vars
1 11 0 5; NIL;
2 11 0 6; NIL;
4 8 0 7; NIL;
5 9 0 8; NIL;
6 1 0 NIL; NIL;
7 1 1 NIL; NIL;
9 7 0 5; NIL;
===== Before the patch =====
===== After the patch =====
<Group-candidate Costs>:
Group 0:
cand cost compl. inv.expr. inv.vars
1 22 4 1; NIL;
2 22 2 1; NIL;
4 2 0 NIL; NIL;
5 2 2 NIL; NIL;
6 16 2 2; NIL;
7 16 4 3; NIL;
9 14 4 1; NIL;
Group 1:
cand cost compl. inv.expr. inv.vars
1 11 2 4; NIL;
2 11 1 4; NIL;
4 8 1 5; NIL;
5 8 2 6; NIL;
6 1 0 NIL; NIL;
7 1 1 NIL; NIL;
9 7 2 4; NIL;
===== After the patch =====
The patch results in an optimal set of candidates being chosen. Before
the patch, the optimal set of candidates was 3 and 4; after the patch,
it became 3, 4, and 6. The GIMPLE output before and after the patch of
the test case for the final optimal set is shown below:
===== Before the patch =====
Improved to:
cost: 26 (complexity 0)
reg_cost: 6
cand_cost: 10
cand_group_cost: 10 (complexity 0)
candidates: 3, 4
group:0 --> iv_cand:4, cost=(2,0)
group:1 --> iv_cand:4, cost=(8,0)
group:2 --> iv_cand:3, cost=(0,0)
invariant variables: 5
invariant expressions: 7
Original cost 26 (complexity 0)
Final cost 26 (complexity 0)
===== Before the patch =====
===== After the patch =====
Improved to:
cost: 26 (complexity 0)
reg_cost: 7
cand_cost: 16
cand_group_cost: 3 (complexity 0)
candidates: 3, 4, 6
group:0 --> iv_cand:4, cost=(2,0)
group:1 --> iv_cand:6, cost=(1,0)
group:2 --> iv_cand:3, cost=(0,0)
invariant variables: 5
invariant expressions:
Original cost 26 (complexity 0)
Final cost 26 (complexity 0)
===== After the patch =====
Before the patch, in the innermost loop (<bb 8>) at the GIMPLE level,
there is significantly more computation for _37 = MEM[(float *)_70]
compared to after the patch. After the patch, a new induction variable
is introduced for _37 = MEM[(float *)_70]. The GIMPLE of the test case
is shown below:
===== Before the patch =====
<bb 8> [local count: 955630225]:
# i_56 = PHI <i_40(13), 0(7)>
# ivtmp.10_85 = PHI <ivtmp.10_84(13), ivtmp.10_83(7)>
_78 = (void *) ivtmp.10_85;
_35 = MEM[(float *)_78];
_76 = (sizetype) _7;
_75 = _76 * 18446744073709551612;
_74 = _75 + ivtmp.10_85;
_73 = (void *) _74;
_71 = ivtmp.25_69;
_70 = _73 + _71;
_37 = MEM[(float *)_70];
_38 = t_28 * _37;
_39 = _35 + _38;
_77 = (void *) ivtmp.10_85;
MEM[(float *)_77] = _39;
i_40 = i_56 + 1;
ivtmp.10_84 = ivtmp.10_85 + 4;
_104 = (int) ivtmp.31_17;
if (i_40 < _104)
goto <bb 13>; [89.00%]
else
goto <bb 9>; [11.00%]
===== Before the patch =====
===== After the patch =====
<bb 8> [local count: 955630225]:
# i_56 = PHI <i_40(13), 0(7)>
# ivtmp.10_85 = PHI <ivtmp.10_84(13), ivtmp.10_83(7)>
# ivtmp.12_78 = PHI <ivtmp.12_77(13), ivtmp.12_76(7)>
_71 = (void *) ivtmp.10_85;
_35 = MEM[(float *)_71];
_69 = (void *) ivtmp.12_78;
_37 = MEM[(float *)_69];
_38 = t_28 * _37;
_39 = _35 + _38;
_70 = (void *) ivtmp.10_85;
MEM[(float *)_70] = _39;
i_40 = i_56 + 1;
ivtmp.10_84 = ivtmp.10_85 + 4;
ivtmp.12_77 = ivtmp.12_78 + 4;
_109 = (int) ivtmp.32_16;
if (i_40 < _109)
goto <bb 13>; [89.00%]
else
goto <bb 9>; [11.00%]
===== After the patch =====
Regards,
Radosav