Re: [committed] [PR rtl-optimization/87761] Don't let trivially dead insns impede regcprop's work
On Tue, Feb 26, 2019 at 08:13:32PM -0700, Jeff Law wrote: > My point is we see this stuff all the time on common platforms with > simple -O2 optimization. It deleted something like 2k dead insns on an > x86_64 bootstrap before I added goof'd up trap_p test. What I don't > have a sense of is how often removing that trivially dead code > ultimately helped cprop on x86_64. But the amount of trivially dead > code on a standard -O2 bootstrap was a huge surprise. It is a big surprise. I wonder what causes it; I don't see it? Do you have an example maybe? Or are you talking not about the subreg thing, but just about the dead code? Most of that will be deleted a little later anyway. Maybe some more places should call delete_trivially_dead_insns (). Segher
Re: [committed] [PR rtl-optimization/87761] Don't let trivially dead insns impede regcprop's work
On 2/26/19 5:18 PM, Segher Boessenkool wrote: > On Tue, Feb 26, 2019 at 04:58:17PM -0700, Jeff Law wrote: >> On 2/26/19 4:19 PM, Segher Boessenkool wrote: >>> On Tue, Feb 26, 2019 at 10:07:54AM -0700, Jeff Law wrote: >> For gcc-10 we should: >> >> 1. Avoid really stupid stuff in init-regs. >> >> 2. Avoid really stupid stuff in the splitters. >> >> 3. Make REE SUBREG-aware >> >> 4. Seriously consider a RTL DCE pass after post-reload splitting >> >> But none of those would actually fix this BZ :-) > > Since we are talking about pie-in-the-sky stuff... I'd like to run some > kind of mini-combine after every split pass (just on the new insns). As > it is the (later) splitters often leave non-optimal code unless a lot of > work is put into those splitters. We should not have as many late > splitters as we do, but for those we do need, it would be helpful if the > compiler could do simple combinations without having to hand-code them. #1 is downright trivial. #2 is pretty easy as well. #3 could be painful, but is probably worth it. #4 is trivial, but would probably have a measurable compile-time cost. > >>> [ I don't see any problems with -O2 btw, only with -O1, so should we care >>> at all anyway? ] >> You can get this stuff at -O2 on other platforms. It's far more common >> than I ever imagined. > > Yeah, that's because the testcase uses -fno-split-wide-types. The subreg > passes do great work :-) My point is we see this stuff all the time on common platforms with simple -O2 optimization. It deleted something like 2k dead insns on an x86_64 bootstrap before I added goof'd up trap_p test. What I don't have a sense of is how often removing that trivially dead code ultimately helped cprop on x86_64. But the amount of trivially dead code on a standard -O2 bootstrap was a huge surprise. Jeff
Re: [committed] [PR rtl-optimization/87761] Don't let trivially dead insns impede regcprop's work
On Tue, Feb 26, 2019 at 04:58:17PM -0700, Jeff Law wrote: > On 2/26/19 4:19 PM, Segher Boessenkool wrote: > > On Tue, Feb 26, 2019 at 10:07:54AM -0700, Jeff Law wrote: > For gcc-10 we should: > > 1. Avoid really stupid stuff in init-regs. > > 2. Avoid really stupid stuff in the splitters. > > 3. Make REE SUBREG-aware > > 4. Seriously consider a RTL DCE pass after post-reload splitting > > But none of those would actually fix this BZ :-) Since we are talking about pie-in-the-sky stuff... I'd like to run some kind of mini-combine after every split pass (just on the new insns). As it is the (later) splitters often leave non-optimal code unless a lot of work is put into those splitters. We should not have as many late splitters as we do, but for those we do need, it would be helpful if the compiler could do simple combinations without having to hand-code them. > > [ I don't see any problems with -O2 btw, only with -O1, so should we care > > at all anyway? ] > You can get this stuff at -O2 on other platforms. It's far more common > than I ever imagined. Yeah, that's because the testcase uses -fno-split-wide-types. The subreg passes do great work :-) Segher
Re: [committed] [PR rtl-optimization/87761] Don't let trivially dead insns impede regcprop's work
On 2/26/19 4:19 PM, Segher Boessenkool wrote: > On Tue, Feb 26, 2019 at 10:07:54AM -0700, Jeff Law wrote: >> As we enter regcprop we have the following horrific RTL: (insn 35 7 36 2 (set (reg:DI 2 $2 [orig:200 x ] [200]) (const_int 0 [0])) "j.c":8:49 313 {*movdi_64bit} (nil)) (insn 36 35 10 2 (set (reg:DI 3 $3 [ x+8 ]) (const_int 0 [0])) "j.c":8:49 313 {*movdi_64bit} (nil)) (insn 10 36 11 2 (set (reg:DI 2 $2 [orig:200 x ] [200]) (reg:DI 4 $4 [206])) "j.c":8:49 313 {*movdi_64bit} (nil)) (insn 11 10 37 2 (set (reg:DI 3 $3 [ x+8 ]) (const_int 0 [0])) "j.c":8:49 313 {*movdi_64bit} (nil)) > > where 35+36 used to be a single set in TImode: > insn_cost 4 for29: r200:TI=0 > insn_cost 4 for10: r200:TI#0=r197:DI > REG_DEAD r197:DI > insn_cost 4 for11: r200:TI#8=0 > > I would hope combine could clean this up, and it tried, but That was one of my early thoughts... But > > Trying 29, 10 -> 11: >29: r200:TI=0 >10: r200:TI#0=r206:DI > REG_DEAD r206:DI >11: r200:TI#8=0 > Can't combine i2 into i3 > > This is because i2 (that's insn 10) has a subreg on the LHS. Right. Already went down that rathole and several others :-) > >> You'll note there's a fair amount of obviously dead code. The dead code >> really hampers regcprop's ability to propagate values. > > And prevents all of the RTL optimisers before it from doing any useful > work. Ideally this should be fixed much earlier. Not that your patch > isn't pretty obviously correct :-) For gcc-10 we should: 1. Avoid really stupid stuff in init-regs. 2. Avoid really stupid stuff in the splitters. 3. Make REE SUBREG-aware 4. Seriously consider a RTL DCE pass after post-reload splitting But none of those would actually fix this BZ :-) > > [ I don't see any problems with -O2 btw, only with -O1, so should we care > at all anyway? ] You can get this stuff at -O2 on other platforms. It's far more common than I ever imagined. Jeff ps. THe patch isn't actually correct. I committed the wrong version. We should be looking at notrap_p on SRC, not INSN. WHen you call it on the insn, the EXPR_LIST for the notes is considered trapping. Ugh. I didn't notice it until I started trying to clean up the 2nd patch and was seeing dead code much later than I expected :( I'll fix that goof tomorrow.
Re: [committed] [PR rtl-optimization/87761] Don't let trivially dead insns impede regcprop's work
On Tue, Feb 26, 2019 at 10:07:54AM -0700, Jeff Law wrote: > As we enter regcprop we have the following horrific RTL: > >> (insn 35 7 36 2 (set (reg:DI 2 $2 [orig:200 x ] [200]) > >> (const_int 0 [0])) "j.c":8:49 313 {*movdi_64bit} > >> (nil)) > >> (insn 36 35 10 2 (set (reg:DI 3 $3 [ x+8 ]) > >> (const_int 0 [0])) "j.c":8:49 313 {*movdi_64bit} > >> (nil)) > >> (insn 10 36 11 2 (set (reg:DI 2 $2 [orig:200 x ] [200]) > >> (reg:DI 4 $4 [206])) "j.c":8:49 313 {*movdi_64bit} > >> (nil)) > >> (insn 11 10 37 2 (set (reg:DI 3 $3 [ x+8 ]) > >> (const_int 0 [0])) "j.c":8:49 313 {*movdi_64bit} > >> (nil)) where 35+36 used to be a single set in TImode: insn_cost 4 for29: r200:TI=0 insn_cost 4 for10: r200:TI#0=r197:DI REG_DEAD r197:DI insn_cost 4 for11: r200:TI#8=0 I would hope combine could clean this up, and it tried, but Trying 29, 10 -> 11: 29: r200:TI=0 10: r200:TI#0=r206:DI REG_DEAD r206:DI 11: r200:TI#8=0 Can't combine i2 into i3 This is because i2 (that's insn 10) has a subreg on the LHS. > You'll note there's a fair amount of obviously dead code. The dead code > really hampers regcprop's ability to propagate values. And prevents all of the RTL optimisers before it from doing any useful work. Ideally this should be fixed much earlier. Not that your patch isn't pretty obviously correct :-) [ I don't see any problems with -O2 btw, only with -O1, so should we care at all anyway? ] Segher