[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation

2015-06-23 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|4.8.5   |4.9.0
  Known to fail||4.8.5

--- Comment #20 from Richard Biener rguenth at gcc dot gnu.org ---
Fixed for 4.9.0.


[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation

2014-12-19 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|4.8.4   |4.8.5

--- Comment #19 from Jakub Jelinek jakub at gcc dot gnu.org ---
GCC 4.8.4 has been released.


[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation

2014-05-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|4.8.3   |4.8.4

--- Comment #18 from Richard Biener rguenth at gcc dot gnu.org ---
GCC 4.8.3 is being released, adjusting target milestone.


[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation

2013-12-12 Thread ubizjak at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339

Uroš Bizjak ubizjak at gmail dot com changed:

   What|Removed |Added

Summary|[4.8/4.9 Regression]:   |[4.8 Regression]:
   |Suboptimal register |Suboptimal register
   |allocation  |allocation

--- Comment #17 from Uroš Bizjak ubizjak at gmail dot com ---
gcc-4.9 now generates:

f:
addsd   %xmm2, %xmm0
ret

The problem is fixed in 4.9, reconfirmed on 4.8 branch.

[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation

2013-03-11 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339



Richard Biener rguenth at gcc dot gnu.org changed:



   What|Removed |Added



   Priority|P3  |P1

 CC||hjl at gcc dot gnu.org,

   ||uros at gcc dot gnu.org



--- Comment #10 from Richard Biener rguenth at gcc dot gnu.org 2013-03-11 
10:46:16 UTC ---

Thus CCing the offending people.


[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation

2013-03-11 Thread ubizjak at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339



Uros Bizjak ubizjak at gmail dot com changed:



   What|Removed |Added



 CC|uros at gcc dot gnu.org |



--- Comment #11 from Uros Bizjak ubizjak at gmail dot com 2013-03-11 11:49:03 
UTC ---

(In reply to comment #10)

 Thus CCing the offending people.



One of the offending people is the reporter.


[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation

2013-03-11 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339



Jakub Jelinek jakub at gcc dot gnu.org changed:



   What|Removed |Added



   Priority|P1  |P2



--- Comment #12 from Jakub Jelinek jakub at gcc dot gnu.org 2013-03-11 
11:57:57 UTC ---

I don't think this is P1.

When looking at the dumps, right after expansion the 4.7 expanded code actually

looks much worse compared to the 4.8 expanded one (4.7 goes through memory,

while 4.8 through subregs), in *.ud-dce it is pretty much comparable, though

4.7 has one extra move:

2 r62:DF=xmm0:DF

  REG_DEAD: xmm0:DF

6 r64:DF=xmm2:DF

  REG_DEAD: xmm2:DF

   21 r66:DF=r62:DF

  REG_DEAD: r62:DF

7 NOTE_INSN_FUNCTION_BEG

   10 r65:DF=r64:DF+r66:DF

  REG_DEAD: r66:DF

  REG_DEAD: r64:DF

   15 xmm0:DF=r65:DF

  REG_DEAD: r65:DF

   18 use xmm0:DF

in 4.7 vs.

2: r64:DF=xmm0:DF

  REG_DEAD xmm0:DF

8: r66:DF=xmm2:DF

  REG_DEAD xmm2:DF

9: NOTE_INSN_FUNCTION_BEG

   12: r67:DF=r66:DF+r64:DF

  REG_DEAD r66:DF

  REG_DEAD r64:DF

   17: xmm0:DF=r67:DF

  REG_DEAD r67:DF

   20: use xmm0:DF

in 4.8.  The combiner change is what matters for the later behavior of regmove,

RA and reload/LRA, but unfortunately that is an ICE fix that can't be reverted,

we really must avoid to propagating hard registers too early, otherwise RA

needs to just give up.

So, in 4.7 we end up with r65:DF=xmm2:DF+r62:DF after combine, while 4.8 still

uses pseudos.  Which is the reason why regmove sets that into the optimal for

this testcase r65:DF=xmm2:DF+r65:DF, because it has no other choice (the

operand was already a hard reg), while in 4.8 it has a choice (sees two

different pseudos, and chooses in this case the wrong one).  I'm afraid there

is no easy fix, so IMHO this needs to be postponed for 4.9.


[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation

2013-03-08 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339



Richard Biener rguenth at gcc dot gnu.org changed:



   What|Removed |Added



 Status|NEW |WAITING



--- Comment #6 from Richard Biener rguenth at gcc dot gnu.org 2013-03-08 
15:39:21 UTC ---

So - what regressed this compared to 4.7?  It wasn't regmove.c changes.


[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation

2013-03-08 Thread ubizjak at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339



Uros Bizjak ubizjak at gmail dot com changed:



   What|Removed |Added



 Status|WAITING |NEW



--- Comment #7 from Uros Bizjak ubizjak at gmail dot com 2013-03-08 15:51:15 
UTC ---

(In reply to comment #6)

 So - what regressed this compared to 4.7?  It wasn't regmove.c changes.



As said in comment 0:



gcc-4.7 generates:



f:

addsd   %xmm2, %xmm0

ret



gcc-4.8 generates:



f:

addsd   %xmm0, %xmm2

movapd  %xmm2, %xmm0

ret


[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation

2013-03-08 Thread steven at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339



--- Comment #8 from Steven Bosscher steven at gcc dot gnu.org 2013-03-08 
18:49:09 UTC ---

(In reply to comment #6)

 So - what regressed this compared to 4.7?  It wasn't regmove.c changes.



Probably LRA, it better respects IRA's choices (which is good).  The fix

should be found in a change to IRA or regmove.


[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation

2013-03-08 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339



Jakub Jelinek jakub at gcc dot gnu.org changed:



   What|Removed |Added



 CC||jakub at gcc dot gnu.org



--- Comment #9 from Jakub Jelinek jakub at gcc dot gnu.org 2013-03-08 
19:46:59 UTC ---

You don't need to guess, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55941#c1

mentions the commits that regressed it.


[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation

2013-02-15 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339



Richard Biener rguenth at gcc dot gnu.org changed:



   What|Removed |Added



   Keywords||missed-optimization

   Target Milestone|--- |4.8.0


[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation

2013-02-15 Thread steven at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339



Steven Bosscher steven at gcc dot gnu.org changed:



   What|Removed |Added



 CC||steven at gcc dot gnu.org



--- Comment #1 from Steven Bosscher steven at gcc dot gnu.org 2013-02-15 
09:51:56 UTC ---

Confirmed.



Actually, it looks to me like IRA already makes this allocation:

Disposition:

1:r64  l0210:r67  l023

(21=%xmm0, 23=%xmm2)



= .ira dump:



;; basic block 2

2: r64:DF=xmm0:DF

8: r67:DF=xmm2:DF

   12: r67:DF=r67:DF+r64:DF

   17: xmm0:DF=r67:DF

   20: use xmm0:DF



= .reload (LRA) dump:



** Local #1: **

 Choosing alt 0 in insn 12:  (0) =x  (1) %0  (2) xm



;; basic block 2

2: xmm0:DF=xmm0:DF

8: xmm2:DF=xmm2:DF

   12: xmm2:DF=xmm2:DF+xmm0:DF

   17: xmm0:DF=xmm2:DF

   20: use xmm0:DF



So all LRA does, is follow IRA's recommended allocation.  That is IMHO

the right thing to do, too.


[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation

2013-02-15 Thread steven at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339



Steven Bosscher steven at gcc dot gnu.org changed:



   What|Removed |Added



 Status|UNCONFIRMED |NEW

   Last reconfirmed||2013-02-15

 Ever Confirmed|0   |1


[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation

2013-02-15 Thread steven at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339



--- Comment #2 from Steven Bosscher steven at gcc dot gnu.org 2013-02-15 
10:00:13 UTC ---

The unbreakable insns 12 xmm2:DF=xmm2:DF+xmm0:DF is created by regmove.



.ce3 dump:

2: r64:DF=xmm0:DF

8: r66:DF=xmm2:DF

   12: r67:DF=r66:DF+r64:DF

   17: xmm0:DF=r67:DF

   20: use xmm0:DF



.regmove dump:

Could fix operand 1 of insn 12 matching operand 0.

2: r64:DF=xmm0:DF

8: r67:DF=xmm2:DF

   12: r67:DF=r67:DF+r64:DF

   17: xmm0:DF=r67:DF

   20: use xmm0:DF



With -fno-regmove:

addsd   %xmm2, %xmm0

ret


[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation

2013-02-15 Thread steven at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339



--- Comment #3 from Steven Bosscher steven at gcc dot gnu.org 2013-02-15 
10:17:19 UTC ---

Before regmove, both input operands die in insn 12:

   12: r67:DF=r66:DF+r64:DF

  REG_DEAD r66:DF

  REG_DEAD r64:DF



and because reg classes haven't been set up, r66 and r67 both have

GENERAL_REGS as their preferred register class so regmove doesn't see

that r64 and r67 share the same preferred register %xmm0:



Breakpoint 1, regmove_backward_pass () at ../../trunk/gcc/regmove.c:1088

1088  if (dump_file)

(gdb) p reg_preferred_class (64) 

$10 = GENERAL_REGS

(gdb) p reg_preferred_class (66)

$11 = GENERAL_REGS

(gdb) p reg_preferred_class (67)

$12 = GENERAL_REGS

(gdb) p ira_set_pseudo_classes (true, dump_file)

$13 = void

(gdb) p reg_preferred_class (64)

$14 = SSE_FIRST_REG

(gdb) p reg_preferred_class (66)

$15 = SSE_REGS

(gdb) p reg_preferred_class (67)

$16 = SSE_FIRST_REG

(gdb)


[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation

2013-02-15 Thread steven at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339



--- Comment #4 from Steven Bosscher steven at gcc dot gnu.org 2013-02-15 
10:20:52 UTC ---

Perhaps for regmove IRA classes should be set up unconditionally:



Index: regmove.c

===

--- regmove.c   (revision 196074)

+++ regmove.c   (working copy)

@@ -1234,8 +1234,9 @@ regmove_optimize (void)

   regstat_init_n_sets_and_refs ();

   regstat_compute_ri ();



-  if (flag_ira_loop_pressure)

-ira_set_pseudo_classes (true, dump_file);

+  /* Set up register classes for pseudos, so that reg_preferred_class

+ returns a more useful result.  */

+  ira_set_pseudo_classes (true, dump_file);



   regno_src_regno = XNEWVEC (int, nregs);

   for (i = nregs; --i = 0; )

@@ -1256,8 +1257,7 @@ regmove_optimize (void)

 }

   regstat_free_n_sets_and_refs ();

   regstat_free_ri ();

-  if (flag_ira_loop_pressure)

-free_reg_info ();

+  free_reg_info ();

   return 0;

 }


[Bug rtl-optimization/56339] [4.8 Regression]: Suboptimal register allocation

2013-02-15 Thread vmakarov at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56339



--- Comment #5 from Vladimir Makarov vmakarov at gcc dot gnu.org 2013-02-15 
16:48:19 UTC ---

(In reply to comment #4)

 Perhaps for regmove IRA classes should be set up unconditionally:

 

 Index: regmove.c

 ===

 --- regmove.c   (revision 196074)

 +++ regmove.c   (working copy)

 @@ -1234,8 +1234,9 @@ regmove_optimize (void)

regstat_init_n_sets_and_refs ();

regstat_compute_ri ();

 

 -  if (flag_ira_loop_pressure)

 -ira_set_pseudo_classes (true, dump_file);

 +  /* Set up register classes for pseudos, so that reg_preferred_class

 + returns a more useful result.  */

 +  ira_set_pseudo_classes (true, dump_file);

 

regno_src_regno = XNEWVEC (int, nregs);

for (i = nregs; --i = 0; )

 @@ -1256,8 +1257,7 @@ regmove_optimize (void)

  }

regstat_free_n_sets_and_refs ();

regstat_free_ri ();

 -  if (flag_ira_loop_pressure)

 -free_reg_info ();

 +  free_reg_info ();

return 0;

  }



It can be a solution.  I see only one drawback, it is expensive.  Setting

classes is expensive procedure requiring 2 passes over all insns, their

alternatives,and classes for each pseudo operand.



In general, it still will not work for other cases.  We are lucky that xmm0

forms own class SSE_FIRST_REG.  Regmove for general cases should see hard regs

not classes.



This is not the first PR about regmove.  I'd like to remove big part of regmove

concerning matching operands as IRA/LRA can deal with this.  Unfortunately, not

too well when hard regs exposed in RTL and some work should be done to improve

this code.  I am going to do for gcc4.9.