[Bug target/109874] [SH] GCC 13's -Os code is 50% bigger than GCC 4's
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109874 Oleg Endo changed: What|Removed |Added CC||olegendo at gcc dot gnu.org --- Comment #3 from Oleg Endo --- (In reply to Richard Biener from comment #2) > It looks like the target cannot do arbitrary constant shifts so it benefits > from shifting incrementally. Even if that is exposed early enough for CSE > the optimal sequences for shifting by 10, 11, 12 and 13 could prevent CSE > here. That's right. SH1, SH2 doesn't have a barrel shifter and needs stitched constant shifts. In some cases we resort to a rt lib call to avoid code bloat. There are a couple of opportunities when sharing intermediate results of incremental / stitched shifts. A while ago I had the idea of writing an RTL pass that would try to figure that out... In this case the shifts are expanded to RTL with the constant shift amounts already propagated and the incremental shifts removed, so it's a bit harder to undo this at the RTL level, but not impossible. On SH3, SH4 dynamic shifts are available, but it requires another register + constant load. Incremental / stitched shifts would be always better on SH for this test case.
[Bug target/109874] [SH] GCC 13's -Os code is 50% bigger than GCC 4's
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109874 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2023-05-17 Keywords||missed-optimization Status|UNCONFIRMED |NEW Target||sh* --- Comment #2 from Richard Biener --- It looks like the target cannot do arbitrary constant shifts so it benefits from shifting incrementally. Even if that is exposed early enough for CSE the optimal sequences for shifting by 10, 11, 12 and 13 could prevent CSE here. I'm not sure if there are other targets affected but this is a "global" optimization problem which for example also affects optimal power expansion. Generally strength-reduction techniques apply to improve these kind of things, possibly in a machine dependent pass. The regression was likely introduced when merging the shifts at the GIMPLE level without considering the uses of the intermediate values (after the transform the values can be computed in parallel since the dependency chains are shortened)
[Bug target/109874] [SH] GCC 13's -Os code is 50% bigger than GCC 4's
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109874 --- Comment #1 from Marc Poulhiès --- Forcing GCC 13 to emit non-PIC (as gcc4) code shaves a few insns, down to 28. ``` _SetupCartCHRMapping: mov r4,r1 mov.l .L3,r2 shlr8 r1 shlr2 r1 add #-1,r1 mov.l r1,@r2 mov r4,r1 shlr8 r1 mov.l .L4,r2 shlrr1 shlr2 r1 add #-1,r1 mov.l r1,@r2 mov r4,r1 shlr8 r1 mov.l .L5,r2 shlr2 r1 shlr2 r1 shlr8 r4 add #-1,r1 shlr2 r4 mov.l r1,@r2 shlrr4 mov.l .L6,r1 shlr2 r4 add #-1,r4 rts mov.l r4,@r1 .L3: .long _CHRmask1 .L4: .long _CHRmask2 .L5: .long _CHRmask4 .L6: .long _CHRmask8 _CHRmask8: .zero 4 _CHRmask4: .zero 4 _CHRmask2: .zero 4 _CHRmask1: .zero 4 ```