[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED Target Milestone|4.8.5 |4.9.0 Known to fail||4.8.5 --- Comment #20 from Richard Biener rguenth at gcc dot gnu.org --- Fixed for 4.9.0.
[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Target Milestone|4.8.4 |4.8.5 --- Comment #19 from Jakub Jelinek jakub at gcc dot gnu.org --- GCC 4.8.4 has been released.
[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Target Milestone|4.8.3 |4.8.4 --- Comment #18 from Richard Biener rguenth at gcc dot gnu.org --- GCC 4.8.3 is being released, adjusting target milestone.
[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339 Uroš Bizjak ubizjak at gmail dot com changed: What|Removed |Added Summary|[4.8/4.9 Regression]: |[4.8 Regression]: |Suboptimal register |Suboptimal register |allocation |allocation --- Comment #17 from Uroš Bizjak ubizjak at gmail dot com --- gcc-4.9 now generates: f: addsd %xmm2, %xmm0 ret The problem is fixed in 4.9, reconfirmed on 4.8 branch.
[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Priority|P3 |P1 CC||hjl at gcc dot gnu.org, ||uros at gcc dot gnu.org --- Comment #10 from Richard Biener rguenth at gcc dot gnu.org 2013-03-11 10:46:16 UTC --- Thus CCing the offending people.
[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339 Uros Bizjak ubizjak at gmail dot com changed: What|Removed |Added CC|uros at gcc dot gnu.org | --- Comment #11 from Uros Bizjak ubizjak at gmail dot com 2013-03-11 11:49:03 UTC --- (In reply to comment #10) Thus CCing the offending people. One of the offending people is the reporter.
[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Priority|P1 |P2 --- Comment #12 from Jakub Jelinek jakub at gcc dot gnu.org 2013-03-11 11:57:57 UTC --- I don't think this is P1. When looking at the dumps, right after expansion the 4.7 expanded code actually looks much worse compared to the 4.8 expanded one (4.7 goes through memory, while 4.8 through subregs), in *.ud-dce it is pretty much comparable, though 4.7 has one extra move: 2 r62:DF=xmm0:DF REG_DEAD: xmm0:DF 6 r64:DF=xmm2:DF REG_DEAD: xmm2:DF 21 r66:DF=r62:DF REG_DEAD: r62:DF 7 NOTE_INSN_FUNCTION_BEG 10 r65:DF=r64:DF+r66:DF REG_DEAD: r66:DF REG_DEAD: r64:DF 15 xmm0:DF=r65:DF REG_DEAD: r65:DF 18 use xmm0:DF in 4.7 vs. 2: r64:DF=xmm0:DF REG_DEAD xmm0:DF 8: r66:DF=xmm2:DF REG_DEAD xmm2:DF 9: NOTE_INSN_FUNCTION_BEG 12: r67:DF=r66:DF+r64:DF REG_DEAD r66:DF REG_DEAD r64:DF 17: xmm0:DF=r67:DF REG_DEAD r67:DF 20: use xmm0:DF in 4.8. The combiner change is what matters for the later behavior of regmove, RA and reload/LRA, but unfortunately that is an ICE fix that can't be reverted, we really must avoid to propagating hard registers too early, otherwise RA needs to just give up. So, in 4.7 we end up with r65:DF=xmm2:DF+r62:DF after combine, while 4.8 still uses pseudos. Which is the reason why regmove sets that into the optimal for this testcase r65:DF=xmm2:DF+r65:DF, because it has no other choice (the operand was already a hard reg), while in 4.8 it has a choice (sees two different pseudos, and chooses in this case the wrong one). I'm afraid there is no easy fix, so IMHO this needs to be postponed for 4.9.
[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Status|NEW |WAITING --- Comment #6 from Richard Biener rguenth at gcc dot gnu.org 2013-03-08 15:39:21 UTC --- So - what regressed this compared to 4.7? It wasn't regmove.c changes.
[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339 Uros Bizjak ubizjak at gmail dot com changed: What|Removed |Added Status|WAITING |NEW --- Comment #7 from Uros Bizjak ubizjak at gmail dot com 2013-03-08 15:51:15 UTC --- (In reply to comment #6) So - what regressed this compared to 4.7? It wasn't regmove.c changes. As said in comment 0: gcc-4.7 generates: f: addsd %xmm2, %xmm0 ret gcc-4.8 generates: f: addsd %xmm0, %xmm2 movapd %xmm2, %xmm0 ret
[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339 --- Comment #8 from Steven Bosscher steven at gcc dot gnu.org 2013-03-08 18:49:09 UTC --- (In reply to comment #6) So - what regressed this compared to 4.7? It wasn't regmove.c changes. Probably LRA, it better respects IRA's choices (which is good). The fix should be found in a change to IRA or regmove.
[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #9 from Jakub Jelinek jakub at gcc dot gnu.org 2013-03-08 19:46:59 UTC --- You don't need to guess, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55941#c1 mentions the commits that regressed it.
[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339 Richard Biener rguenth at gcc dot gnu.org changed: What|Removed |Added Keywords||missed-optimization Target Milestone|--- |4.8.0
[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339 Steven Bosscher steven at gcc dot gnu.org changed: What|Removed |Added CC||steven at gcc dot gnu.org --- Comment #1 from Steven Bosscher steven at gcc dot gnu.org 2013-02-15 09:51:56 UTC --- Confirmed. Actually, it looks to me like IRA already makes this allocation: Disposition: 1:r64 l0210:r67 l023 (21=%xmm0, 23=%xmm2) = .ira dump: ;; basic block 2 2: r64:DF=xmm0:DF 8: r67:DF=xmm2:DF 12: r67:DF=r67:DF+r64:DF 17: xmm0:DF=r67:DF 20: use xmm0:DF = .reload (LRA) dump: ** Local #1: ** Choosing alt 0 in insn 12: (0) =x (1) %0 (2) xm ;; basic block 2 2: xmm0:DF=xmm0:DF 8: xmm2:DF=xmm2:DF 12: xmm2:DF=xmm2:DF+xmm0:DF 17: xmm0:DF=xmm2:DF 20: use xmm0:DF So all LRA does, is follow IRA's recommended allocation. That is IMHO the right thing to do, too.
[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339 Steven Bosscher steven at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2013-02-15 Ever Confirmed|0 |1
[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339 --- Comment #2 from Steven Bosscher steven at gcc dot gnu.org 2013-02-15 10:00:13 UTC --- The unbreakable insns 12 xmm2:DF=xmm2:DF+xmm0:DF is created by regmove. .ce3 dump: 2: r64:DF=xmm0:DF 8: r66:DF=xmm2:DF 12: r67:DF=r66:DF+r64:DF 17: xmm0:DF=r67:DF 20: use xmm0:DF .regmove dump: Could fix operand 1 of insn 12 matching operand 0. 2: r64:DF=xmm0:DF 8: r67:DF=xmm2:DF 12: r67:DF=r67:DF+r64:DF 17: xmm0:DF=r67:DF 20: use xmm0:DF With -fno-regmove: addsd %xmm2, %xmm0 ret
[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339 --- Comment #3 from Steven Bosscher steven at gcc dot gnu.org 2013-02-15 10:17:19 UTC --- Before regmove, both input operands die in insn 12: 12: r67:DF=r66:DF+r64:DF REG_DEAD r66:DF REG_DEAD r64:DF and because reg classes haven't been set up, r66 and r67 both have GENERAL_REGS as their preferred register class so regmove doesn't see that r64 and r67 share the same preferred register %xmm0: Breakpoint 1, regmove_backward_pass () at ../../trunk/gcc/regmove.c:1088 1088 if (dump_file) (gdb) p reg_preferred_class (64) $10 = GENERAL_REGS (gdb) p reg_preferred_class (66) $11 = GENERAL_REGS (gdb) p reg_preferred_class (67) $12 = GENERAL_REGS (gdb) p ira_set_pseudo_classes (true, dump_file) $13 = void (gdb) p reg_preferred_class (64) $14 = SSE_FIRST_REG (gdb) p reg_preferred_class (66) $15 = SSE_REGS (gdb) p reg_preferred_class (67) $16 = SSE_FIRST_REG (gdb)
[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339 --- Comment #4 from Steven Bosscher steven at gcc dot gnu.org 2013-02-15 10:20:52 UTC --- Perhaps for regmove IRA classes should be set up unconditionally: Index: regmove.c === --- regmove.c (revision 196074) +++ regmove.c (working copy) @@ -1234,8 +1234,9 @@ regmove_optimize (void) regstat_init_n_sets_and_refs (); regstat_compute_ri (); - if (flag_ira_loop_pressure) -ira_set_pseudo_classes (true, dump_file); + /* Set up register classes for pseudos, so that reg_preferred_class + returns a more useful result. */ + ira_set_pseudo_classes (true, dump_file); regno_src_regno = XNEWVEC (int, nregs); for (i = nregs; --i = 0; ) @@ -1256,8 +1257,7 @@ regmove_optimize (void) } regstat_free_n_sets_and_refs (); regstat_free_ri (); - if (flag_ira_loop_pressure) -free_reg_info (); + free_reg_info (); return 0; }
[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339 --- Comment #5 from Vladimir Makarov vmakarov at gcc dot gnu.org 2013-02-15 16:48:19 UTC --- (In reply to comment #4) Perhaps for regmove IRA classes should be set up unconditionally: Index: regmove.c === --- regmove.c (revision 196074) +++ regmove.c (working copy) @@ -1234,8 +1234,9 @@ regmove_optimize (void) regstat_init_n_sets_and_refs (); regstat_compute_ri (); - if (flag_ira_loop_pressure) -ira_set_pseudo_classes (true, dump_file); + /* Set up register classes for pseudos, so that reg_preferred_class + returns a more useful result. */ + ira_set_pseudo_classes (true, dump_file); regno_src_regno = XNEWVEC (int, nregs); for (i = nregs; --i = 0; ) @@ -1256,8 +1257,7 @@ regmove_optimize (void) } regstat_free_n_sets_and_refs (); regstat_free_ri (); - if (flag_ira_loop_pressure) -free_reg_info (); + free_reg_info (); return 0; } It can be a solution. I see only one drawback, it is expensive. Setting classes is expensive procedure requiring 2 passes over all insns, their alternatives,and classes for each pseudo operand. In general, it still will not work for other cases. We are lucky that xmm0 forms own class SSE_FIRST_REG. Regmove for general cases should see hard regs not classes. This is not the first PR about regmove. I'd like to remove big part of regmove concerning matching operands as IRA/LRA can deal with this. Unfortunately, not too well when hard regs exposed in RTL and some work should be done to improve this code. I am going to do for gcc4.9.