[Bug target/109874] [SH] GCC 13's -Os code is 50% bigger than GCC 4's

2023-07-07 Thread olegendo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109874

Oleg Endo  changed:

   What|Removed |Added

 CC||olegendo at gcc dot gnu.org

--- Comment #3 from Oleg Endo  ---
(In reply to Richard Biener from comment #2)
> It looks like the target cannot do arbitrary constant shifts so it benefits
> from shifting incrementally.  Even if that is exposed early enough for CSE
> the optimal sequences for shifting by 10, 11, 12 and 13 could prevent CSE
> here.

That's right.  SH1, SH2 doesn't have a barrel shifter and needs stitched
constant shifts.  In some cases we resort to a rt lib call to avoid code bloat.

There are a couple of opportunities when sharing intermediate results of
incremental / stitched shifts.  A while ago I had the idea of writing an RTL
pass that would try to figure that out...

In this case the shifts are expanded to RTL with the constant shift amounts
already propagated and the incremental shifts removed, so it's a bit harder to
undo this at the RTL level, but not impossible.

On SH3, SH4 dynamic shifts are available, but it requires another register +
constant load.  Incremental / stitched shifts would be always better on SH for
this test case.

[Bug target/109874] [SH] GCC 13's -Os code is 50% bigger than GCC 4's

2023-05-17 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109874

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-05-17
   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
 Target||sh*

--- Comment #2 from Richard Biener  ---
It looks like the target cannot do arbitrary constant shifts so it benefits
from shifting incrementally.  Even if that is exposed early enough for CSE the
optimal sequences for shifting by 10, 11, 12 and 13 could prevent CSE here.

I'm not sure if there are other targets affected but this is a "global"
optimization problem which for example also affects optimal power expansion.

Generally strength-reduction techniques apply to improve these kind of
things, possibly in a machine dependent pass.

The regression was likely introduced when merging the shifts at the GIMPLE
level without considering the uses of the intermediate values (after the
transform
the values can be computed in parallel since the dependency chains are
shortened)

[Bug target/109874] [SH] GCC 13's -Os code is 50% bigger than GCC 4's

2023-05-16 Thread dkm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109874

--- Comment #1 from Marc Poulhiès  ---
Forcing GCC 13 to emit non-PIC (as gcc4) code shaves a few insns, down to 28.

```
_SetupCartCHRMapping:
mov r4,r1
mov.l   .L3,r2
shlr8   r1
shlr2   r1
add #-1,r1
mov.l   r1,@r2
mov r4,r1
shlr8   r1
mov.l   .L4,r2
shlrr1
shlr2   r1
add #-1,r1
mov.l   r1,@r2
mov r4,r1
shlr8   r1
mov.l   .L5,r2
shlr2   r1
shlr2   r1
shlr8   r4
add #-1,r1
shlr2   r4
mov.l   r1,@r2
shlrr4
mov.l   .L6,r1
shlr2   r4
add #-1,r4
rts 
mov.l   r4,@r1
.L3:
.long   _CHRmask1
.L4:
.long   _CHRmask2
.L5:
.long   _CHRmask4
.L6:
.long   _CHRmask8
_CHRmask8:
.zero   4
_CHRmask4:
.zero   4
_CHRmask2:
.zero   4
_CHRmask1:
.zero   4
```