Hi, 

Based on the previous discussion of v2, I'm reposting this patch with
suggested changes for complexity and adding an additional MIPS-specific
assembly test.

The assembly test for this patch scans pattern in the innermost
loop(.L6), where the lwc1 instruction for $f1 must come first, followed
by the lwc1 for $f0.  In addition, less computation is required to
perform the lw instructions compared to the case before the patch, which
results in reduced code size.  Before the patch the non-optimal set for
iv candidates is chosen.
===== Before the patch =====
.L6:
        daddu   $3,$9,$2
        lwc1    $f0,0($2)
        daddu   $3,$3,$8
        daddiu  $2,$2,4
        lwc1    $f1,0($3)
        addiu   $7,$7,1
        maddf.s $f0,$f2,$f1
        swc1    $f0,-4($2)
        bltc    $7,$10,.L6
        addiu   $11,$11,1
        bne     $6,$11,.L7
        daddu   $8,$8,$14
===== Before the patch =====
===== After the patch =====
.L6:
        lwc1    $f1,0($7)
        daddiu  $2,$2,4
        lwc1    $f0,-4($2)
        addiu   $3,$3,1
        daddiu  $7,$7,4
        maddf.s $f0,$f2,$f1
        swc1    $f0,-4($2)
        bltc    $3,$8,.L6
        addiu   $10,$10,1
        bne     $6,$10,.L7
        daddu   $9,$9,$13
===== After the patch =====

Before the commit f9f69dd, the complexity was incremented when
aff_inv.offset was not equal to 0.  In this patch, if aff_inv is not
constant, we want to check the base + offset addressing mode even if
base + index is not supported. For example, the MIPS architecture does
not support base + index but does support base + offset, which should
therefore be checked.

Looking at the GIMPLE of the test case before and after the patch, we
see Group 0 and Group 1 of type REFERENCE ADDRESSES. The group costs
before and after the patch are shown below:
===== Before the patch =====
<Group-candidate Costs>:
Group 0:
  cand  cost    compl.  inv.expr.       inv.vars
  1     22      0       1;      NIL;
  2     22      0       2;      NIL;
  4     2       0       NIL;    NIL;
  5     2       2       NIL;    NIL;
  6     16      0       3;      NIL;
  7     18      0       4;      NIL;
  9     14      0       1;      NIL;

Group 1:
  cand  cost    compl.  inv.expr.       inv.vars
  1     11      0       5;      NIL;
  2     11      0       6;      NIL;
  4     8       0       7;      NIL;
  5     9       0       8;      NIL;
  6     1       0       NIL;    NIL;
  7     1       1       NIL;    NIL;
  9     7       0       5;      NIL;
===== Before the patch =====
===== After the patch =====
<Group-candidate Costs>:
Group 0:
  cand  cost    compl.  inv.expr.       inv.vars
  1     22      4       1;      NIL;
  2     22      2       1;      NIL;
  4     2       0       NIL;    NIL;
  5     2       2       NIL;    NIL;
  6     16      2       2;      NIL;
  7     16      4       3;      NIL;
  9     14      4       1;      NIL;

Group 1:
  cand  cost    compl.  inv.expr.       inv.vars
  1     11      2       4;      NIL;
  2     11      1       4;      NIL;
  4     8       1       5;      NIL;
  5     8       2       6;      NIL;
  6     1       0       NIL;    NIL;
  7     1       1       NIL;    NIL;
  9     7       2       4;      NIL;
===== After the patch =====

The patch results in an optimal set of candidates being chosen. Before
the patch, the optimal set of candidates was 3 and 4; after the patch,
it became 3, 4, and 6. The GIMPLE output before and after the patch of
the test case for the final optimal set is shown below:
===== Before the patch =====
Improved to:
  cost: 26 (complexity 0)
  reg_cost: 6
  cand_cost: 10
  cand_group_cost: 10 (complexity 0)
  candidates: 3, 4
   group:0 --> iv_cand:4, cost=(2,0)
   group:1 --> iv_cand:4, cost=(8,0)
   group:2 --> iv_cand:3, cost=(0,0)
  invariant variables: 5
  invariant expressions: 7

Original cost 26 (complexity 0)

Final cost 26 (complexity 0)
===== Before the patch =====
===== After the patch =====
Improved to:
  cost: 26 (complexity 0)
  reg_cost: 7
  cand_cost: 16
  cand_group_cost: 3 (complexity 0)
  candidates: 3, 4, 6
   group:0 --> iv_cand:4, cost=(2,0)
   group:1 --> iv_cand:6, cost=(1,0)
   group:2 --> iv_cand:3, cost=(0,0)
  invariant variables: 5
  invariant expressions: 

Original cost 26 (complexity 0)

Final cost 26 (complexity 0)
===== After the patch =====

Before the patch, in the innermost loop (<bb 8>) at the GIMPLE level,
there is significantly more computation for _37 = MEM[(float *)_70]
compared to after the patch. After the patch, a new induction variable
is introduced for _37 = MEM[(float *)_70]. The GIMPLE of the test case
is shown below:
===== Before the patch =====
  <bb 8> [local count: 955630225]:
  # i_56 = PHI <i_40(13), 0(7)>
  # ivtmp.10_85 = PHI <ivtmp.10_84(13), ivtmp.10_83(7)>
  _78 = (void *) ivtmp.10_85;
  _35 = MEM[(float *)_78];
  _76 = (sizetype) _7;
  _75 = _76 * 18446744073709551612;
  _74 = _75 + ivtmp.10_85;
  _73 = (void *) _74;
  _71 = ivtmp.25_69;
  _70 = _73 + _71;
  _37 = MEM[(float *)_70];
  _38 = t_28 * _37;
  _39 = _35 + _38;
  _77 = (void *) ivtmp.10_85;
  MEM[(float *)_77] = _39;
  i_40 = i_56 + 1;
  ivtmp.10_84 = ivtmp.10_85 + 4;
  _104 = (int) ivtmp.31_17;
  if (i_40 < _104)
    goto <bb 13>; [89.00%]
  else
    goto <bb 9>; [11.00%]
===== Before the patch =====
===== After the patch =====
  <bb 8> [local count: 955630225]:
  # i_56 = PHI <i_40(13), 0(7)>
  # ivtmp.10_85 = PHI <ivtmp.10_84(13), ivtmp.10_83(7)>
  # ivtmp.12_78 = PHI <ivtmp.12_77(13), ivtmp.12_76(7)>
  _71 = (void *) ivtmp.10_85;
  _35 = MEM[(float *)_71];
  _69 = (void *) ivtmp.12_78;
  _37 = MEM[(float *)_69];
  _38 = t_28 * _37;
  _39 = _35 + _38;
  _70 = (void *) ivtmp.10_85;
  MEM[(float *)_70] = _39;
  i_40 = i_56 + 1;
  ivtmp.10_84 = ivtmp.10_85 + 4;
  ivtmp.12_77 = ivtmp.12_78 + 4;
  _109 = (int) ivtmp.32_16;
  if (i_40 < _109)
    goto <bb 13>; [89.00%]
  else
    goto <bb 9>; [11.00%]
===== After the patch =====

Regards,
Radosav
  • [no subject] Radosav Krunic
    • Re: Richard Biener

Reply via email to