https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67732

--- Comment #2 from Kazumoto Kojima <kkojima at gcc dot gnu.org> ---
(In reply to Oleg Endo from comment #1)
> There are some improvements with LRA, but the regression in teem-1.6.0-src
> 
>      total:   1105728 -> 1122288    +16560 / +1.497656 %
> 
> outweighs all of the improvements.

teem shows problems with -mlra.  I'm not sure whether all of them
are pure RA issues or not, though.  Typical inflation with LRA in
teem would be oparations against float matrices.  An example

void
foo (float mat[])
{
  float *p = &mat[16];

  *--p = 0.0;  *--p = 0.0;  *--p = 0.0;  *--p = 0.0;
  *--p = 0.0;  *--p = 0.0;  *--p = 0.0;  *--p = 0.0;
}

shows ~20% code size regression with -O2.  At .ira, it looks like:

r165 := 0.0
...
r167 := r162 + 56
mem[r167] := r165
r170 := r162 + 52
mem[r170] := r165
r173 := r162 + 48
mem[r173] := r165
...

and the old reload generates:

fr1 := 0.0
...
r1 := r4
r1 := r1 + 56
mem[r1] := fr1
r1 := r4
r1 := r1 + 52
mem[r1] := fr1
r1 := r4
r1 := r1 + 48
mem[r1] := fr1
...

Then postreload fixes it up to

fr1 := 0.0
...
r1 := r1 + 56
mem[r1] := fr1
r1 := r1 - 4
mem[r1] := fr1
r1 := r1 - 4
mem[r1] := fr1
...

OTOH, LRA generates:
fr1 := 0.0
...
r1 := r4 + 56
mem[r1] := fr1
r1 := r4 + 52
mem[r1] := fr1
r1 := r4 + 48
mem[r1] := fr1
...

Unfortunately postreload doesn't fix it up and split2 makes it
into the similar code generated with the old reload.
I've run postreload_cse again just after pass_split_after_reload
and got same text size with the old reload for the above example
at -mlra.  The CSiBE total sizes are:

3303471  -mno-lra without 2nd postreload_cse
3315689  -mlra without 2nd postreload_cse
3293271  -mno-lra with 2nd postreload_cse
3299813  -mlra with 2nd postreload_cse

and teem only comparison is:

1079932  -mno-lra without 2nd postreload_cse
1094468  -mlra without 2nd postreload_cse
1073104  -mno-lra with 2nd postreload_cse
1082764  -mlra with 2nd postreload_cse

Reply via email to