On Wed, Mar 4, 2026 at 2:49 PM Radosav Krunic <[email protected]> wrote: > > Hi, > > Based on the previous discussion of v2, I'm reposting this patch with > suggested changes for complexity and adding an additional MIPS-specific > assembly test. > > The assembly test for this patch scans pattern in the innermost > loop(.L6), where the lwc1 instruction for $f1 must come first, followed > by the lwc1 for $f0. In addition, less computation is required to > perform the lw instructions compared to the case before the patch, which > results in reduced code size. Before the patch the non-optimal set for > iv candidates is chosen. > ===== Before the patch ===== > .L6: > daddu $3,$9,$2 > lwc1 $f0,0($2) > daddu $3,$3,$8 > daddiu $2,$2,4 > lwc1 $f1,0($3) > addiu $7,$7,1 > maddf.s $f0,$f2,$f1 > swc1 $f0,-4($2) > bltc $7,$10,.L6 > addiu $11,$11,1 > bne $6,$11,.L7 > daddu $8,$8,$14 > ===== Before the patch ===== > ===== After the patch ===== > .L6: > lwc1 $f1,0($7) > daddiu $2,$2,4 > lwc1 $f0,-4($2) > addiu $3,$3,1 > daddiu $7,$7,4 > maddf.s $f0,$f2,$f1 > swc1 $f0,-4($2) > bltc $3,$8,.L6 > addiu $10,$10,1 > bne $6,$10,.L7 > daddu $9,$9,$13 > ===== After the patch ===== > > Before the commit f9f69dd, the complexity was incremented when > aff_inv.offset was not equal to 0. In this patch, if aff_inv is not > constant, we want to check the base + offset addressing mode even if > base + index is not supported. For example, the MIPS architecture does > not support base + index but does support base + offset, which should > therefore be checked.
complexity is only a tie breaker for otherwise equal cost IVs. Thus the above reasoning is dubious - if some variant is not checked (considered), it's not because of complexity. It looks more like your target does not cost in a expected way. Richard. > > Looking at the GIMPLE of the test case before and after the patch, we > see Group 0 and Group 1 of type REFERENCE ADDRESSES. The group costs > before and after the patch are shown below: > ===== Before the patch ===== > <Group-candidate Costs>: > Group 0: > cand cost compl. inv.expr. inv.vars > 1 22 0 1; NIL; > 2 22 0 2; NIL; > 4 2 0 NIL; NIL; > 5 2 2 NIL; NIL; > 6 16 0 3; NIL; > 7 18 0 4; NIL; > 9 14 0 1; NIL; > > Group 1: > cand cost compl. inv.expr. inv.vars > 1 11 0 5; NIL; > 2 11 0 6; NIL; > 4 8 0 7; NIL; > 5 9 0 8; NIL; > 6 1 0 NIL; NIL; > 7 1 1 NIL; NIL; > 9 7 0 5; NIL; > ===== Before the patch ===== > ===== After the patch ===== > <Group-candidate Costs>: > Group 0: > cand cost compl. inv.expr. inv.vars > 1 22 4 1; NIL; > 2 22 2 1; NIL; > 4 2 0 NIL; NIL; > 5 2 2 NIL; NIL; > 6 16 2 2; NIL; > 7 16 4 3; NIL; > 9 14 4 1; NIL; > > Group 1: > cand cost compl. inv.expr. inv.vars > 1 11 2 4; NIL; > 2 11 1 4; NIL; > 4 8 1 5; NIL; > 5 8 2 6; NIL; > 6 1 0 NIL; NIL; > 7 1 1 NIL; NIL; > 9 7 2 4; NIL; > ===== After the patch ===== > > The patch results in an optimal set of candidates being chosen. Before > the patch, the optimal set of candidates was 3 and 4; after the patch, > it became 3, 4, and 6. The GIMPLE output before and after the patch of > the test case for the final optimal set is shown below: > ===== Before the patch ===== > Improved to: > cost: 26 (complexity 0) > reg_cost: 6 > cand_cost: 10 > cand_group_cost: 10 (complexity 0) > candidates: 3, 4 > group:0 --> iv_cand:4, cost=(2,0) > group:1 --> iv_cand:4, cost=(8,0) > group:2 --> iv_cand:3, cost=(0,0) > invariant variables: 5 > invariant expressions: 7 > > Original cost 26 (complexity 0) > > Final cost 26 (complexity 0) > ===== Before the patch ===== > ===== After the patch ===== > Improved to: > cost: 26 (complexity 0) > reg_cost: 7 > cand_cost: 16 > cand_group_cost: 3 (complexity 0) > candidates: 3, 4, 6 > group:0 --> iv_cand:4, cost=(2,0) > group:1 --> iv_cand:6, cost=(1,0) > group:2 --> iv_cand:3, cost=(0,0) > invariant variables: 5 > invariant expressions: > > Original cost 26 (complexity 0) > > Final cost 26 (complexity 0) > ===== After the patch ===== > > Before the patch, in the innermost loop (<bb 8>) at the GIMPLE level, > there is significantly more computation for _37 = MEM[(float *)_70] > compared to after the patch. After the patch, a new induction variable > is introduced for _37 = MEM[(float *)_70]. The GIMPLE of the test case > is shown below: > ===== Before the patch ===== > <bb 8> [local count: 955630225]: > # i_56 = PHI <i_40(13), 0(7)> > # ivtmp.10_85 = PHI <ivtmp.10_84(13), ivtmp.10_83(7)> > _78 = (void *) ivtmp.10_85; > _35 = MEM[(float *)_78]; > _76 = (sizetype) _7; > _75 = _76 * 18446744073709551612; > _74 = _75 + ivtmp.10_85; > _73 = (void *) _74; > _71 = ivtmp.25_69; > _70 = _73 + _71; > _37 = MEM[(float *)_70]; > _38 = t_28 * _37; > _39 = _35 + _38; > _77 = (void *) ivtmp.10_85; > MEM[(float *)_77] = _39; > i_40 = i_56 + 1; > ivtmp.10_84 = ivtmp.10_85 + 4; > _104 = (int) ivtmp.31_17; > if (i_40 < _104) > goto <bb 13>; [89.00%] > else > goto <bb 9>; [11.00%] > ===== Before the patch ===== > ===== After the patch ===== > <bb 8> [local count: 955630225]: > # i_56 = PHI <i_40(13), 0(7)> > # ivtmp.10_85 = PHI <ivtmp.10_84(13), ivtmp.10_83(7)> > # ivtmp.12_78 = PHI <ivtmp.12_77(13), ivtmp.12_76(7)> > _71 = (void *) ivtmp.10_85; > _35 = MEM[(float *)_71]; > _69 = (void *) ivtmp.12_78; > _37 = MEM[(float *)_69]; > _38 = t_28 * _37; > _39 = _35 + _38; > _70 = (void *) ivtmp.10_85; > MEM[(float *)_70] = _39; > i_40 = i_56 + 1; > ivtmp.10_84 = ivtmp.10_85 + 4; > ivtmp.12_77 = ivtmp.12_78 + 4; > _109 = (int) ivtmp.32_16; > if (i_40 < _109) > goto <bb 13>; [89.00%] > else > goto <bb 9>; [11.00%] > ===== After the patch ===== > > Regards, > Radosav
