[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902 Jeffrey A. Law changed: What|Removed |Added Priority|P3 |P2 CC||law at redhat dot com
[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-02-05 Ever confirmed|0 |1
[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902 --- Comment #8 from Segher Boessenkool --- (In reply to Ilya Leoshkevich from comment #7) > Apparently, for this specific case doing more of hard register copy > propagation is enough. I've just tried running pass_cprop_hardreg > before pass_thread_prologue_and_epilogue, and it helped. > > So, would running a mini-cprop_hardreg instead of just > copyprop_hardreg_forward_bb_without_debug_insn (entry_block) be > reasonable here? Something along the lines of: > > - Do something like pre_and_rev_post_order_compute_fn (), but do not go > further from bbs which contain insns satisfying > requires_stack_frame_p (), since shrink-wrapping cannot happen past > those anyway. I don't think that is true. Separate shrink-wrapping... > Same for bbs which have more than 1 predecessor, since > cprop_hardreg forgets everything it saw when it encounters those. Not > sure if a reasonable merge function can be defined for struct > value_data to improve this? > > Maybe also stop completely when a certain number of bbs is found. > > - Do something like pass_cprop_hardreg::execute (), but use only bbs > computed during the previous step. I think running the normal cprop_hardreg here is fine. Or is it so expensive? > Btw, would reverse postorder be > the "more intelligent queuing of blocks" mentioned in > pass_cprop_hardreg::execute ()? Maybe? It's not totally clear what is wanted here. > When you say that what IRA does is not effective, do you mean just the > need to track indirect hard register copies, or can it be improved even > further? I mean that split_live_ranges_for_shrink_wrap does not help much.
[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902 --- Comment #7 from Ilya Leoshkevich --- Apparently, for this specific case doing more of hard register copy propagation is enough. I've just tried running pass_cprop_hardreg before pass_thread_prologue_and_epilogue, and it helped. So, would running a mini-cprop_hardreg instead of just copyprop_hardreg_forward_bb_without_debug_insn (entry_block) be reasonable here? Something along the lines of: - Do something like pre_and_rev_post_order_compute_fn (), but do not go further from bbs which contain insns satisfying requires_stack_frame_p (), since shrink-wrapping cannot happen past those anyway. Same for bbs which have more than 1 predecessor, since cprop_hardreg forgets everything it saw when it encounters those. Not sure if a reasonable merge function can be defined for struct value_data to improve this? Maybe also stop completely when a certain number of bbs is found. - Do something like pass_cprop_hardreg::execute (), but use only bbs computed during the previous step. Btw, would reverse postorder be the "more intelligent queuing of blocks" mentioned in pass_cprop_hardreg::execute ()? When you say that what IRA does is not effective, do you mean just the need to track indirect hard register copies, or can it be improved even further?
[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902 --- Comment #6 from Segher Boessenkool --- Oh sure, if all you want to do is extend the prepare_shrinkwrap function, that just works there and it doesn't need to do a lot of profitability trade-offs. However it isn't very effective there. It's better to do it just before register allocation. IRA tries to do a little, too, also not very effective :-( If you want to just extend prepare_shrinkwrap, so that it handles more than just the first BB, what order should it try? Should it be just greedy, or should it look how it can get best gain? Shrink-wrapping could wrap about 3x as many BBs as it does currently, but just extending prepare_shrinkwrap doesn't get very far. Which is not an argument to not do a better job there, of course ;-)
[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902 --- Comment #5 from Ilya Leoshkevich --- By the time shrink-wrapping is performed, which is after LRA (pass_thread_prologue_and_epilogue, to be precise), aren't all spilling decisions already made? Because if that's true, we have to be conservative in prepare_shrink_wrap () anyway, and move down copies only when the parameter register still contains the parameter value.
[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902 --- Comment #4 from Segher Boessenkool --- All instructions that depend on the new registers can start later, too, if you move all new registers down. If you move copies from hard registers down it is much worse: you are extending the lifetime of those hard regs so that nothing else can live there, but more likely, it will have to be spilled even.
[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902 --- Comment #3 from Ilya Leoshkevich --- Judging by the following comment in lra-coalesce.c, RA doesn't do this intentionally: Here we coalesce only spilled pseudos. Coalescing non-spilled pseudos (with different hard regs) might result in spilling additional pseudos because of possible conflicts with other non-spilled pseudos and, as a consequence, in more constraint passes and even LRA infinite cycling. Trivial the same hard register moves will be removed by subsequent compiler passes. In which cases would moving copies down in prepare_shrink_wrap () make the code worse?
[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902 --- Comment #2 from Segher Boessenkool --- So why does it use r12 there if it could use r2? That's an RA problem. This is related to PR87708, in a way. prepare_shrinkwrap needs a good overhaul. Moving all copies down also *degrades* code quality, more often if you don't restrict it to the first BB.
[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902 --- Comment #1 from Ilya Leoshkevich --- Bisect points to r265398: combine: Do not combine moves from hard registers. I wonder what would be the best place to fix this? I was thinking about making shrink-wrapping try harder by not limiting the processing to the first basic block.
[Bug rtl-optimization/87902] [9 Regression] Shrink-wrapping multiple conditions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87902 Richard Biener changed: What|Removed |Added Target Milestone|--- |9.0