[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions

2019-03-27 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P2
 CC||law at redhat dot com

[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions

2019-02-05 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-02-05
 Ever confirmed|0   |1

[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions

2018-11-27 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902

--- Comment #8 from Segher Boessenkool  ---
(In reply to Ilya Leoshkevich from comment #7)
> Apparently, for this specific case doing more of hard register copy
> propagation is enough.  I've just tried running pass_cprop_hardreg
> before pass_thread_prologue_and_epilogue, and it helped.
> 
> So, would running a mini-cprop_hardreg instead of just
> copyprop_hardreg_forward_bb_without_debug_insn (entry_block) be
> reasonable here?  Something along the lines of:
> 
> - Do something like pre_and_rev_post_order_compute_fn (), but do not go
>   further from bbs which contain insns satisfying
>   requires_stack_frame_p (), since shrink-wrapping cannot happen past
>   those anyway.

I don't think that is true.  Separate shrink-wrapping...

>   Same for bbs which have more than 1 predecessor, since
>   cprop_hardreg forgets everything it saw when it encounters those.  Not
>   sure if a reasonable merge function can be defined for struct
>   value_data to improve this?
> 
>   Maybe also stop completely when a certain number of bbs is found.
> 
> - Do something like pass_cprop_hardreg::execute (), but use only bbs
>   computed during the previous step.

I think running the normal cprop_hardreg here is fine.  Or is it so
expensive?

>   Btw, would reverse postorder be
>   the "more intelligent queuing of blocks" mentioned in
>   pass_cprop_hardreg::execute ()?

Maybe?  It's not totally clear what is wanted here.

> When you say that what IRA does is not effective, do you mean just the
> need to track indirect hard register copies, or can it be improved even
> further?

I mean that split_live_ranges_for_shrink_wrap does not help much.

[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions

2018-11-23 Thread iii at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902

--- Comment #7 from Ilya Leoshkevich  ---
Apparently, for this specific case doing more of hard register copy
propagation is enough.  I've just tried running pass_cprop_hardreg
before pass_thread_prologue_and_epilogue, and it helped.

So, would running a mini-cprop_hardreg instead of just
copyprop_hardreg_forward_bb_without_debug_insn (entry_block) be
reasonable here?  Something along the lines of:

- Do something like pre_and_rev_post_order_compute_fn (), but do not go
  further from bbs which contain insns satisfying
  requires_stack_frame_p (), since shrink-wrapping cannot happen past
  those anyway.

  Same for bbs which have more than 1 predecessor, since
  cprop_hardreg forgets everything it saw when it encounters those.  Not
  sure if a reasonable merge function can be defined for struct
  value_data to improve this?

  Maybe also stop completely when a certain number of bbs is found.

- Do something like pass_cprop_hardreg::execute (), but use only bbs
  computed during the previous step.  Btw, would reverse postorder be
  the "more intelligent queuing of blocks" mentioned in
  pass_cprop_hardreg::execute ()?



When you say that what IRA does is not effective, do you mean just the
need to track indirect hard register copies, or can it be improved even
further?

[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions

2018-11-14 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902

--- Comment #6 from Segher Boessenkool  ---
Oh sure, if all you want to do is extend the prepare_shrinkwrap function,
that just works there and it doesn't need to do a lot of profitability
trade-offs.  However it isn't very effective there.  It's better to do it
just before register allocation.  IRA tries to do a little, too, also not
very effective :-(

If you want to just extend prepare_shrinkwrap, so that it handles more than
just the first BB, what order should it try?  Should it be just greedy, or
should it look how it can get best gain?

Shrink-wrapping could wrap about 3x as many BBs as it does currently, but
just extending prepare_shrinkwrap doesn't get very far.  Which is not an
argument to not do a better job there, of course ;-)

[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions

2018-11-09 Thread iii at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902

--- Comment #5 from Ilya Leoshkevich  ---
By the time shrink-wrapping is performed, which is after LRA
(pass_thread_prologue_and_epilogue, to be precise), aren't all spilling
decisions already made?  Because if that's true, we have to be
conservative in prepare_shrink_wrap () anyway, and move down copies only
when the parameter register still contains the parameter value.

[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions

2018-11-08 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902

--- Comment #4 from Segher Boessenkool  ---
All instructions that depend on the new registers can start later, too, if
you move all new registers down.  If you move copies from hard registers
down it is much worse: you are extending the lifetime of those hard regs
so that nothing else can live there, but more likely, it will have to be
spilled even.

[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions

2018-11-08 Thread iii at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902

--- Comment #3 from Ilya Leoshkevich  ---
Judging by the following comment in lra-coalesce.c, RA doesn't do this
intentionally:

   Here we coalesce only spilled pseudos.  Coalescing non-spilled
   pseudos (with different hard regs) might result in spilling
   additional pseudos because of possible conflicts with other
   non-spilled pseudos and, as a consequence, in more constraint
   passes and even LRA infinite cycling.  Trivial the same hard
   register moves will be removed by subsequent compiler passes.

In which cases would moving copies down in prepare_shrink_wrap () make
the code worse?

[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions

2018-11-06 Thread segher at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902

--- Comment #2 from Segher Boessenkool  ---
So why does it use r12 there if it could use r2?  That's an RA problem.
This is related to PR87708, in a way.

prepare_shrinkwrap needs a good overhaul.  Moving all copies down also
*degrades* code quality, more often if you don't restrict it to the
first BB.

[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions

2018-11-06 Thread iii at linux dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902

--- Comment #1 from Ilya Leoshkevich  ---
Bisect points to r265398: combine: Do not combine moves from hard
registers.

I wonder what would be the best place to fix this?  I was thinking about
making shrink-wrapping try harder by not limiting the processing to the
first basic block.

[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions

2018-11-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |9.0