Re: [committed] [PR rtl-optimization/87761] Don't let trivially dead insns impede regcprop's work

2019-02-27 Thread Segher Boessenkool
On Tue, Feb 26, 2019 at 08:13:32PM -0700, Jeff Law wrote:
> My point is we see this stuff all the time on common platforms with
> simple -O2 optimization.  It deleted something like 2k dead insns on an
> x86_64 bootstrap before I added goof'd up trap_p test.  What I don't
> have a sense of is how often removing that trivially dead code
> ultimately helped cprop on x86_64.  But the amount of trivially dead
> code on a standard -O2 bootstrap was a huge surprise.

It is a big surprise.  I wonder what causes it; I don't see it?  Do you
have an example maybe?

Or are you talking not about the subreg thing, but just about the dead
code?  Most of that will be deleted a little later anyway.  Maybe some
more places should call delete_trivially_dead_insns ().


Segher


Re: [committed] [PR rtl-optimization/87761] Don't let trivially dead insns impede regcprop's work

2019-02-26 Thread Jeff Law
On 2/26/19 5:18 PM, Segher Boessenkool wrote:
> On Tue, Feb 26, 2019 at 04:58:17PM -0700, Jeff Law wrote:
>> On 2/26/19 4:19 PM, Segher Boessenkool wrote:
>>> On Tue, Feb 26, 2019 at 10:07:54AM -0700, Jeff Law wrote:
>> For gcc-10 we should:
>>
>>   1. Avoid really stupid stuff in init-regs.
>>
>>   2. Avoid really stupid stuff in the splitters.
>>
>>   3. Make REE SUBREG-aware
>>
>>   4. Seriously consider a RTL DCE pass after post-reload splitting
>>
>> But none of those would actually fix this BZ :-)
> 
> Since we are talking about pie-in-the-sky stuff...  I'd like to run some
> kind of mini-combine after every split pass (just on the new insns).  As
> it is the (later) splitters often leave non-optimal code unless a lot of
> work is put into those splitters.  We should not have as many late
> splitters as we do, but for those we do need, it would be helpful if the
> compiler could do simple combinations without having to hand-code them.
#1 is downright trivial.  #2 is pretty easy as well.  #3 could be
painful, but is probably worth it.  #4 is trivial, but would probably
have a measurable compile-time cost.

> 
>>> [ I don't see any problems with -O2 btw, only with -O1, so should we care
>>> at all anyway? ]
>> You can get this stuff at -O2 on other platforms.  It's far more common
>> than I ever imagined.
> 
> Yeah, that's because the testcase uses -fno-split-wide-types.  The subreg
> passes do great work :-)
My point is we see this stuff all the time on common platforms with
simple -O2 optimization.  It deleted something like 2k dead insns on an
x86_64 bootstrap before I added goof'd up trap_p test.  What I don't
have a sense of is how often removing that trivially dead code
ultimately helped cprop on x86_64.  But the amount of trivially dead
code on a standard -O2 bootstrap was a huge surprise.

Jeff


Re: [committed] [PR rtl-optimization/87761] Don't let trivially dead insns impede regcprop's work

2019-02-26 Thread Segher Boessenkool
On Tue, Feb 26, 2019 at 04:58:17PM -0700, Jeff Law wrote:
> On 2/26/19 4:19 PM, Segher Boessenkool wrote:
> > On Tue, Feb 26, 2019 at 10:07:54AM -0700, Jeff Law wrote:
> For gcc-10 we should:
> 
>   1. Avoid really stupid stuff in init-regs.
> 
>   2. Avoid really stupid stuff in the splitters.
> 
>   3. Make REE SUBREG-aware
> 
>   4. Seriously consider a RTL DCE pass after post-reload splitting
> 
> But none of those would actually fix this BZ :-)

Since we are talking about pie-in-the-sky stuff...  I'd like to run some
kind of mini-combine after every split pass (just on the new insns).  As
it is the (later) splitters often leave non-optimal code unless a lot of
work is put into those splitters.  We should not have as many late
splitters as we do, but for those we do need, it would be helpful if the
compiler could do simple combinations without having to hand-code them.

> > [ I don't see any problems with -O2 btw, only with -O1, so should we care
> > at all anyway? ]
> You can get this stuff at -O2 on other platforms.  It's far more common
> than I ever imagined.

Yeah, that's because the testcase uses -fno-split-wide-types.  The subreg
passes do great work :-)


Segher


Re: [committed] [PR rtl-optimization/87761] Don't let trivially dead insns impede regcprop's work

2019-02-26 Thread Jeff Law
On 2/26/19 4:19 PM, Segher Boessenkool wrote:
> On Tue, Feb 26, 2019 at 10:07:54AM -0700, Jeff Law wrote:
>> As we enter regcprop we have the following horrific RTL:
 (insn 35 7 36 2 (set (reg:DI 2 $2 [orig:200 x ] [200])
 (const_int 0 [0])) "j.c":8:49 313 {*movdi_64bit}
  (nil))
 (insn 36 35 10 2 (set (reg:DI 3 $3 [ x+8 ])
 (const_int 0 [0])) "j.c":8:49 313 {*movdi_64bit}
  (nil))
 (insn 10 36 11 2 (set (reg:DI 2 $2 [orig:200 x ] [200])
 (reg:DI 4 $4 [206])) "j.c":8:49 313 {*movdi_64bit}
  (nil))
 (insn 11 10 37 2 (set (reg:DI 3 $3 [ x+8 ])
 (const_int 0 [0])) "j.c":8:49 313 {*movdi_64bit}
  (nil))
> 
> where 35+36 used to be a single set in TImode:
> insn_cost 4 for29: r200:TI=0
> insn_cost 4 for10: r200:TI#0=r197:DI
>   REG_DEAD r197:DI
> insn_cost 4 for11: r200:TI#8=0
> 
> I would hope combine could clean this up, and it tried, but
That was one of my early thoughts...  But

> 
> Trying 29, 10 -> 11:
>29: r200:TI=0
>10: r200:TI#0=r206:DI
>   REG_DEAD r206:DI
>11: r200:TI#8=0
> Can't combine i2 into i3
> 
> This is because i2 (that's insn 10) has a subreg on the LHS.
Right.  Already went down that rathole and several others :-)

> 
>> You'll note there's a fair amount of obviously dead code.  The dead code
>> really hampers regcprop's ability to propagate values.
> 
> And prevents all of the RTL optimisers before it from doing any useful
> work.  Ideally this should be fixed much earlier.  Not that your patch
> isn't pretty obviously correct :-)
For gcc-10 we should:

  1. Avoid really stupid stuff in init-regs.

  2. Avoid really stupid stuff in the splitters.

  3. Make REE SUBREG-aware

  4. Seriously consider a RTL DCE pass after post-reload splitting

But none of those would actually fix this BZ :-)

> 
> [ I don't see any problems with -O2 btw, only with -O1, so should we care
> at all anyway? ]
You can get this stuff at -O2 on other platforms.  It's far more common
than I ever imagined.

Jeff

ps.  THe patch isn't actually correct.  I committed the wrong version.
We should be looking at notrap_p on SRC, not INSN.  WHen you call it on
the insn, the EXPR_LIST for the notes is considered trapping.  Ugh.  I
didn't notice it until I started trying to clean up the 2nd patch and
was seeing dead code much later than I expected :(   I'll fix that goof
tomorrow.




Re: [committed] [PR rtl-optimization/87761] Don't let trivially dead insns impede regcprop's work

2019-02-26 Thread Segher Boessenkool
On Tue, Feb 26, 2019 at 10:07:54AM -0700, Jeff Law wrote:
> As we enter regcprop we have the following horrific RTL:
> >> (insn 35 7 36 2 (set (reg:DI 2 $2 [orig:200 x ] [200])
> >> (const_int 0 [0])) "j.c":8:49 313 {*movdi_64bit}
> >>  (nil))
> >> (insn 36 35 10 2 (set (reg:DI 3 $3 [ x+8 ])
> >> (const_int 0 [0])) "j.c":8:49 313 {*movdi_64bit}
> >>  (nil))
> >> (insn 10 36 11 2 (set (reg:DI 2 $2 [orig:200 x ] [200])
> >> (reg:DI 4 $4 [206])) "j.c":8:49 313 {*movdi_64bit}
> >>  (nil))
> >> (insn 11 10 37 2 (set (reg:DI 3 $3 [ x+8 ])
> >> (const_int 0 [0])) "j.c":8:49 313 {*movdi_64bit}
> >>  (nil))

where 35+36 used to be a single set in TImode:
insn_cost 4 for29: r200:TI=0
insn_cost 4 for10: r200:TI#0=r197:DI
  REG_DEAD r197:DI
insn_cost 4 for11: r200:TI#8=0

I would hope combine could clean this up, and it tried, but

Trying 29, 10 -> 11:
   29: r200:TI=0
   10: r200:TI#0=r206:DI
  REG_DEAD r206:DI
   11: r200:TI#8=0
Can't combine i2 into i3

This is because i2 (that's insn 10) has a subreg on the LHS.

> You'll note there's a fair amount of obviously dead code.  The dead code
> really hampers regcprop's ability to propagate values.

And prevents all of the RTL optimisers before it from doing any useful
work.  Ideally this should be fixed much earlier.  Not that your patch
isn't pretty obviously correct :-)

[ I don't see any problems with -O2 btw, only with -O1, so should we care
at all anyway? ]


Segher