On Wed, Mar 4, 2026 at 2:49 PM Radosav Krunic
<[email protected]> wrote:
>
> Hi,
>
> Based on the previous discussion of v2, I'm reposting this patch with
> suggested changes for complexity and adding an additional MIPS-specific
> assembly test.
>
> The assembly test for this patch scans pattern in the innermost
> loop(.L6), where the lwc1 instruction for $f1 must come first, followed
> by the lwc1 for $f0.  In addition, less computation is required to
> perform the lw instructions compared to the case before the patch, which
> results in reduced code size.  Before the patch the non-optimal set for
> iv candidates is chosen.
> ===== Before the patch =====
> .L6:
>         daddu   $3,$9,$2
>         lwc1    $f0,0($2)
>         daddu   $3,$3,$8
>         daddiu  $2,$2,4
>         lwc1    $f1,0($3)
>         addiu   $7,$7,1
>         maddf.s $f0,$f2,$f1
>         swc1    $f0,-4($2)
>         bltc    $7,$10,.L6
>         addiu   $11,$11,1
>         bne     $6,$11,.L7
>         daddu   $8,$8,$14
> ===== Before the patch =====
> ===== After the patch =====
> .L6:
>         lwc1    $f1,0($7)
>         daddiu  $2,$2,4
>         lwc1    $f0,-4($2)
>         addiu   $3,$3,1
>         daddiu  $7,$7,4
>         maddf.s $f0,$f2,$f1
>         swc1    $f0,-4($2)
>         bltc    $3,$8,.L6
>         addiu   $10,$10,1
>         bne     $6,$10,.L7
>         daddu   $9,$9,$13
> ===== After the patch =====
>
> Before the commit f9f69dd, the complexity was incremented when
> aff_inv.offset was not equal to 0.  In this patch, if aff_inv is not
> constant, we want to check the base + offset addressing mode even if
> base + index is not supported. For example, the MIPS architecture does
> not support base + index but does support base + offset, which should
> therefore be checked.

complexity is only a tie breaker for otherwise equal cost IVs.  Thus the above
reasoning is dubious - if some variant is not checked (considered),
it's not because of
complexity.  It looks more like your target does not cost in a expected way.

Richard.

>
> Looking at the GIMPLE of the test case before and after the patch, we
> see Group 0 and Group 1 of type REFERENCE ADDRESSES. The group costs
> before and after the patch are shown below:
> ===== Before the patch =====
> <Group-candidate Costs>:
> Group 0:
>   cand  cost    compl.  inv.expr.       inv.vars
>   1     22      0       1;      NIL;
>   2     22      0       2;      NIL;
>   4     2       0       NIL;    NIL;
>   5     2       2       NIL;    NIL;
>   6     16      0       3;      NIL;
>   7     18      0       4;      NIL;
>   9     14      0       1;      NIL;
>
> Group 1:
>   cand  cost    compl.  inv.expr.       inv.vars
>   1     11      0       5;      NIL;
>   2     11      0       6;      NIL;
>   4     8       0       7;      NIL;
>   5     9       0       8;      NIL;
>   6     1       0       NIL;    NIL;
>   7     1       1       NIL;    NIL;
>   9     7       0       5;      NIL;
> ===== Before the patch =====
> ===== After the patch =====
> <Group-candidate Costs>:
> Group 0:
>   cand  cost    compl.  inv.expr.       inv.vars
>   1     22      4       1;      NIL;
>   2     22      2       1;      NIL;
>   4     2       0       NIL;    NIL;
>   5     2       2       NIL;    NIL;
>   6     16      2       2;      NIL;
>   7     16      4       3;      NIL;
>   9     14      4       1;      NIL;
>
> Group 1:
>   cand  cost    compl.  inv.expr.       inv.vars
>   1     11      2       4;      NIL;
>   2     11      1       4;      NIL;
>   4     8       1       5;      NIL;
>   5     8       2       6;      NIL;
>   6     1       0       NIL;    NIL;
>   7     1       1       NIL;    NIL;
>   9     7       2       4;      NIL;
> ===== After the patch =====
>
> The patch results in an optimal set of candidates being chosen. Before
> the patch, the optimal set of candidates was 3 and 4; after the patch,
> it became 3, 4, and 6. The GIMPLE output before and after the patch of
> the test case for the final optimal set is shown below:
> ===== Before the patch =====
> Improved to:
>   cost: 26 (complexity 0)
>   reg_cost: 6
>   cand_cost: 10
>   cand_group_cost: 10 (complexity 0)
>   candidates: 3, 4
>    group:0 --> iv_cand:4, cost=(2,0)
>    group:1 --> iv_cand:4, cost=(8,0)
>    group:2 --> iv_cand:3, cost=(0,0)
>   invariant variables: 5
>   invariant expressions: 7
>
> Original cost 26 (complexity 0)
>
> Final cost 26 (complexity 0)
> ===== Before the patch =====
> ===== After the patch =====
> Improved to:
>   cost: 26 (complexity 0)
>   reg_cost: 7
>   cand_cost: 16
>   cand_group_cost: 3 (complexity 0)
>   candidates: 3, 4, 6
>    group:0 --> iv_cand:4, cost=(2,0)
>    group:1 --> iv_cand:6, cost=(1,0)
>    group:2 --> iv_cand:3, cost=(0,0)
>   invariant variables: 5
>   invariant expressions:
>
> Original cost 26 (complexity 0)
>
> Final cost 26 (complexity 0)
> ===== After the patch =====
>
> Before the patch, in the innermost loop (<bb 8>) at the GIMPLE level,
> there is significantly more computation for _37 = MEM[(float *)_70]
> compared to after the patch. After the patch, a new induction variable
> is introduced for _37 = MEM[(float *)_70]. The GIMPLE of the test case
> is shown below:
> ===== Before the patch =====
>   <bb 8> [local count: 955630225]:
>   # i_56 = PHI <i_40(13), 0(7)>
>   # ivtmp.10_85 = PHI <ivtmp.10_84(13), ivtmp.10_83(7)>
>   _78 = (void *) ivtmp.10_85;
>   _35 = MEM[(float *)_78];
>   _76 = (sizetype) _7;
>   _75 = _76 * 18446744073709551612;
>   _74 = _75 + ivtmp.10_85;
>   _73 = (void *) _74;
>   _71 = ivtmp.25_69;
>   _70 = _73 + _71;
>   _37 = MEM[(float *)_70];
>   _38 = t_28 * _37;
>   _39 = _35 + _38;
>   _77 = (void *) ivtmp.10_85;
>   MEM[(float *)_77] = _39;
>   i_40 = i_56 + 1;
>   ivtmp.10_84 = ivtmp.10_85 + 4;
>   _104 = (int) ivtmp.31_17;
>   if (i_40 < _104)
>     goto <bb 13>; [89.00%]
>   else
>     goto <bb 9>; [11.00%]
> ===== Before the patch =====
> ===== After the patch =====
>   <bb 8> [local count: 955630225]:
>   # i_56 = PHI <i_40(13), 0(7)>
>   # ivtmp.10_85 = PHI <ivtmp.10_84(13), ivtmp.10_83(7)>
>   # ivtmp.12_78 = PHI <ivtmp.12_77(13), ivtmp.12_76(7)>
>   _71 = (void *) ivtmp.10_85;
>   _35 = MEM[(float *)_71];
>   _69 = (void *) ivtmp.12_78;
>   _37 = MEM[(float *)_69];
>   _38 = t_28 * _37;
>   _39 = _35 + _38;
>   _70 = (void *) ivtmp.10_85;
>   MEM[(float *)_70] = _39;
>   i_40 = i_56 + 1;
>   ivtmp.10_84 = ivtmp.10_85 + 4;
>   ivtmp.12_77 = ivtmp.12_78 + 4;
>   _109 = (int) ivtmp.32_16;
>   if (i_40 < _109)
>     goto <bb 13>; [89.00%]
>   else
>     goto <bb 9>; [11.00%]
> ===== After the patch =====
>
> Regards,
> Radosav

Reply via email to