[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code

2008-12-09 Thread Joey dot ye at intel dot com


--- Comment #12 from Joey dot ye at intel dot com  2008-12-10 03:01 ---
Fixed at trunk 142631


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948



[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code

2008-12-09 Thread hjl dot tools at gmail dot com


--- Comment #13 from hjl dot tools at gmail dot com  2008-12-10 05:02 
---
Fixed.


-- 

hjl dot tools at gmail dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948



[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code

2008-12-06 Thread steven at gcc dot gnu dot org


--- Comment #11 from steven at gcc dot gnu dot org  2008-12-06 22:05 ---
What's the status of this bug? Fixed?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948



[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code

2008-11-10 Thread vmakarov at redhat dot com


--- Comment #8 from vmakarov at redhat dot com  2008-11-10 16:12 ---
  H.J., thanks for finding the problem and reducing the test case.

  The problem could be solved by using extended register coalescing.  Now IRA
coalesces only move insns (-fira-coalesce).  But unfortunately usage of
-fira-coalesce makes worse code in general case.  Register preferencing based
on hard register costs works generally better in IRA.

  Therefore I've tried to find what is wrong with the hard register cost
calculation.  Why loading register from its equivalent memory location and 3
usages of the register in the loop of 36.c is cheaper than 1 def and 1 usage of
another pseudo-register (that is major difference from the old register
allocator).  The problem is in usage GENERAL_REGS class to calculate saving
from loading pseudo from the equivalent memory instead of pseudo cover class
(SSE_REGS).  It gives from cost 12 instead of 6 which results in wrong choice
for spilling and worse code.

  I'll send a patch fixing this problem a bit later today.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948



[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code

2008-11-10 Thread vmakarov at gcc dot gnu dot org


--- Comment #9 from vmakarov at gcc dot gnu dot org  2008-11-10 23:23 
---
Subject: Bug 37948

Author: vmakarov
Date: Mon Nov 10 23:21:45 2008
New Revision: 141753

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=141753
Log:
2008-11-07  Vladimir Makarov  [EMAIL PROTECTED]

PR rtl-optimizations/37948
* ira-int.h (struct ira_allocno_copy): New member constraint_p.
(ira_create_copy, ira_add_allocno_copy): New parameter.

* ira-conflicts.c (process_regs_for_copy): New parameter.  Pass it
to ira_add_allocno_copy.
(process_reg_shuffles, add_insn_allocno_copies): Pass a new
parameter to process_regs_for_copy.
(propagate_copies): Pass a new parameter to ira_add_allocno_copy.
Fix typo in passing second allocno to ira_add_allocno_copy.

* ira-color.c (update_conflict_hard_regno_costs): Use head of
coalesced allocnos list.
(assign_hard_reg): Ditto.  Check that assigned allocnos are not in
the graph.
(add_ira_allocno_to_bucket): Rename to add_allocno_to_bucket.
(add_ira_allocno_to_ordered_bucket): Rename to
add_allocno_to_ordered_bucket.
(push_ira_allocno_to_stack): Rename to push_allocno_to_stack.  Use
head of coalesced allocnos list.
(push_allocnos_to_stack): Remove calculation of ALLOCNO_TEMP.
Check that it is aready calculated.
(push_ira_allocno_to_spill): Rename to push_ira_allocno_to_spill.
(setup_allocno_left_conflicts_num): Use head of coalesced allocnos
list.
(coalesce_allocnos): Do extended coalescing too.

* ira-emit.c (add_range_and_copies_from_move_list): Pass a new
parameter to ira_add_allocno_copy.

* ira-build.c (ira_create_copy, ira_add_allocno_copy): Add a new
parameter.
(print_copy): Print copy origination too.

* ira-costs.c (scan_one_insn): Use alloc_pref for load from
equivalent memory.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/ira-build.c
trunk/gcc/ira-color.c
trunk/gcc/ira-conflicts.c
trunk/gcc/ira-costs.c
trunk/gcc/ira-emit.c
trunk/gcc/ira-int.h


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948



[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code

2008-11-10 Thread hjl at gcc dot gnu dot org


--- Comment #10 from hjl at gcc dot gnu dot org  2008-11-11 00:01 ---
Subject: Bug 37948

Author: hjl
Date: Mon Nov 10 23:59:57 2008
New Revision: 141756

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=141756
Log:
2008-11-10  H.J. Lu  [EMAIL PROTECTED]

Backport from mainline:
2008-11-10  Vladimir Makarov  [EMAIL PROTECTED]

PR rtl-optimizations/37948
* ira-int.h (struct ira_allocno_copy): New member constraint_p.
(ira_create_copy, ira_add_allocno_copy): New parameter.

* ira-conflicts.c (process_regs_for_copy): New parameter.  Pass it
to ira_add_allocno_copy.
(process_reg_shuffles, add_insn_allocno_copies): Pass a new
parameter to process_regs_for_copy.
(propagate_copies): Pass a new parameter to ira_add_allocno_copy.
Fix typo in passing second allocno to ira_add_allocno_copy.

* ira-color.c (update_conflict_hard_regno_costs): Use head of
coalesced allocnos list.
(assign_hard_reg): Ditto.  Check that assigned allocnos are not in
the graph.
(add_ira_allocno_to_bucket): Rename to add_allocno_to_bucket.
(add_ira_allocno_to_ordered_bucket): Rename to
add_allocno_to_ordered_bucket.
(push_ira_allocno_to_stack): Rename to push_allocno_to_stack.  Use
head of coalesced allocnos list.
(push_allocnos_to_stack): Remove calculation of ALLOCNO_TEMP.
Check that it is aready calculated.
(push_ira_allocno_to_spill): Rename to push_ira_allocno_to_spill.
(setup_allocno_left_conflicts_num): Use head of coalesced allocnos
list.
(coalesce_allocnos): Do extended coalescing too.

* ira-emit.c (add_range_and_copies_from_move_list): Pass a new
parameter to ira_add_allocno_copy.

* ira-build.c (ira_create_copy, ira_add_allocno_copy): Add a new
parameter.
(print_copy): Print copy origination too.

* ira-costs.c (scan_one_insn): Use alloc_pref for load from
equivalent memory.

Modified:
branches/ira-merge/gcc/ChangeLog.ira
branches/ira-merge/gcc/ira-build.c
branches/ira-merge/gcc/ira-color.c
branches/ira-merge/gcc/ira-conflicts.c
branches/ira-merge/gcc/ira-costs.c
branches/ira-merge/gcc/ira-emit.c
branches/ira-merge/gcc/ira-int.h


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948



[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code

2008-11-04 Thread hjl dot tools at gmail dot com


--- Comment #7 from hjl dot tools at gmail dot com  2008-11-04 19:36 ---
IRA generates much slower codes:

[EMAIL PROTECTED] 37364]$ make
/export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -O2 -m32  -fno-ira -o noira
foo.c
/export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -O2 -m32  -o ira foo.c
time ./noira
7.62user 0.01system 0:07.65elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+88minor)pagefaults 0swaps
time ./ira
7.81user 0.01system 0:07.83elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+87minor)pagefaults 0swaps
[EMAIL PROTECTED] 37364]$ make
time ./noira
7.07user 0.01system 0:07.10elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+88minor)pagefaults 0swaps
time ./ira
7.96user 0.00system 0:07.97elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+87minor)pagefaults 0swaps
[EMAIL PROTECTED] 37364]$ make
time ./noira
7.40user 0.00system 0:07.40elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+87minor)pagefaults 0swaps
time ./ira
7.81user 0.00system 0:07.82elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+87minor)pagefaults 0swaps
[EMAIL PROTECTED] 37364]$ 


-- 

hjl dot tools at gmail dot com changed:

   What|Removed |Added

Summary|[4.4 Regression] IRA|[4.4 Regression] IRA
   |generates slower code for - |generates slower code
   |mtune=core2 |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948



[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code for -mtune=core2

2008-10-30 Thread rguenth at gcc dot gnu dot org


--- Comment #5 from rguenth at gcc dot gnu dot org  2008-10-30 21:05 ---
So, is this a target issue or a register allocator issue now?  Has the costs
fix
been applied?


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

   Priority|P3  |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948



[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code for -mtune=core2

2008-10-30 Thread hjl dot tools at gmail dot com


--- Comment #6 from hjl dot tools at gmail dot com  2008-10-30 22:51 ---
(In reply to comment #5)
 So, is this a target issue or a register allocator issue now?  Has the costs
 fix
 been applied?
 

It is an IRA issue since -fno-ira is still faster with -mtune=generic.
IRA should be fixed first before changing Core 2 cost.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948



[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code for -mtune=core2

2008-10-29 Thread bonzini at gnu dot org


--- Comment #2 from bonzini at gnu dot org  2008-10-29 07:17 ---
Subject: Re:  [4.4 Regression] IRA generates slower
 code for -mtune=core2

hjl dot tools at gmail dot com wrote:
 --- Comment #1 from hjl dot tools at gmail dot com  2008-10-29 05:44 
 ---
 It looks like the cost of loading/storing FP values aren't appropriate for
 Core 2. With this patch:

Good.  Is regmove still helping (which would be the wrong thing to do,
but gives a data point)?

Paolo


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948



Re: [Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code for -mtune=core2

2008-10-29 Thread Andrew Thomas Pinski



Sent from my iPhone

On Oct 29, 2008, at 12:17 AM, bonzini at gnu dot org [EMAIL PROTECTED] 
 wrote:





--- Comment #2 from bonzini at gnu dot org  2008-10-29 07:17  
---

Subject: Re:  [4.4 Regression] IRA generates slower
code for -mtune=core2

hjl dot tools at gmail dot com wrote:
--- Comment #1 from hjl dot tools at gmail dot com  2008-10-29  
05:44 ---
It looks like the cost of loading/storing FP values aren't  
appropriate for

Core 2. With this patch:


Good.  Is regmove still helping (which would be the wrong thing to do,
but gives a data point)?


I noticed that ira ignores ! part of constraint do you know if the  
register class would change if ! was not ignored?


Thanks,
Andrew Pinsky




Paolo


--


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948



[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code for -mtune=core2

2008-10-29 Thread pinskia at gmail dot com


--- Comment #3 from pinskia at gmail dot com  2008-10-29 07:25 ---
Subject: Re:  [4.4 Regression] IRA generates slower code for -mtune=core2



Sent from my iPhone

On Oct 29, 2008, at 12:17 AM, bonzini at gnu dot org
[EMAIL PROTECTED] 
  wrote:



 --- Comment #2 from bonzini at gnu dot org  2008-10-29 07:17  
 ---
 Subject: Re:  [4.4 Regression] IRA generates slower
 code for -mtune=core2

 hjl dot tools at gmail dot com wrote:
 --- Comment #1 from hjl dot tools at gmail dot com  2008-10-29  
 05:44 ---
 It looks like the cost of loading/storing FP values aren't  
 appropriate for
 Core 2. With this patch:

 Good.  Is regmove still helping (which would be the wrong thing to do,
 but gives a data point)?

I noticed that ira ignores ! part of constraint do you know if the  
register class would change if ! was not ignored?

Thanks,
Andrew Pinsky



 Paolo


 -- 


 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948



[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code for -mtune=core2

2008-10-29 Thread hjl dot tools at gmail dot com


--- Comment #4 from hjl dot tools at gmail dot com  2008-10-29 13:08 ---
(In reply to comment #2)
 Subject: Re:  [4.4 Regression] IRA generates slower
  code for -mtune=core2
 
 hjl dot tools at gmail dot com wrote:
  --- Comment #1 from hjl dot tools at gmail dot com  2008-10-29 05:44 
  ---
  It looks like the cost of loading/storing FP values aren't appropriate for
  Core 2. With this patch:
 
 Good.  Is regmove still helping (which would be the wrong thing to do,
 but gives a data point)?
 
 Paolo
 

For this bug, regmove has mixed impacts with updated core2_cost:

[EMAIL PROTECTED] regmove]$ ../xgcc -B../ -m32 -O2 /tmp/foo.c -o core2.sse
-mtune=core2  -msse3 -mfpmath=sse
[EMAIL PROTECTED] regmove]$ ../xgcc -B../ -m32 -O2 /tmp/foo.c -o o2.sse   -msse3
-mfpmath=sse
[EMAIL PROTECTED] regmove]$ ../xgcc -B../ -m32 -O2 /tmp/foo.c -o core2 
-mtune=core2   
[EMAIL PROTECTED] regmove]$ ../xgcc -B../ -m32 -O2 /tmp/foo.c -o o2
[EMAIL PROTECTED] regmove]$ time ./o2

real0m7.995s
user0m7.956s
sys 0m0.003s
[EMAIL PROTECTED] regmove]$ time ./core2

real0m7.951s
user0m7.950s
sys 0m0.000s
[EMAIL PROTECTED] regmove]$ time ./core2.sse

real0m7.358s
user0m7.357s
sys 0m0.000s
[EMAIL PROTECTED] regmove]$ time ./o2.sse

real0m7.177s
user0m7.176s
sys 0m0.000s
[EMAIL PROTECTED] regmove]$


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948



[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code for -mtune=core2

2008-10-29 Thread rguenth at gcc dot gnu dot org


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

   Target Milestone|--- |4.4.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948



[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code for -mtune=core2

2008-10-28 Thread hjl dot tools at gmail dot com


--- Comment #1 from hjl dot tools at gmail dot com  2008-10-29 05:44 ---
It looks like the cost of loading/storing FP values aren't appropriate for
Core 2. With this patch:

[EMAIL PROTECTED] i386]$ diff -up i386.c.foo i386.c
--- i386.c.foo  2008-10-28 21:56:19.0 -0700
+++ i386.c  2008-10-28 22:01:53.0 -0700
@@ -990,9 +990,9 @@ struct processor_costs core2_cost = {
   Relative to reg-reg move (2).  */
   {4, 4, 4},   /* cost of storing integer registers */
   2,   /* cost of reg,reg fld/fst */
-  {6, 6, 6},   /* cost of loading fp registers
+  {12, 12, 12},/* cost of loading fp registers
   in SFmode, DFmode and XFmode */
-  {4, 4, 4},   /* cost of storing fp registers
+  {6, 6, 8},   /* cost of storing fp registers
   in SFmode, DFmode and XFmode */
   2,   /* cost of moving MMX register */
   {6, 6},  /* cost of loading MMX registers
@@ -1000,9 +1000,9 @@ struct processor_costs core2_cost = {
   {4, 4},  /* cost of storing MMX registers
   in SImode and DImode */
   2,   /* cost of moving SSE register */
-  {6, 6, 6},   /* cost of loading SSE registers
+  {8, 8, 8},   /* cost of loading SSE registers
   in SImode, DImode and TImode */
-  {4, 4, 4},   /* cost of storing SSE registers
+  {8, 8, 8},   /* cost of storing SSE registers
   in SImode, DImode and TImode */
   2,   /* MMX or SSE register to integer */
   32,  /* size of l1 cache.  */
[EMAIL PROTECTED] i386]$

I got

[EMAIL PROTECTED] gcc]$  ./xgcc -B./ -m32 -O2 /tmp/foo.c -o core2.sse 
-mtune=core2 
-msse3 -mfpmath=sse
[EMAIL PROTECTED] gcc]$  ./xgcc -B./ -m32 -O2 /tmp/foo.c -o core2 -mtune=core2
[EMAIL PROTECTED] gcc]$ ./xgcc -B./ -m32 -O2 /tmp/foo.c -o o2  -msse3 
-mfpmath=sse
[EMAIL PROTECTED] gcc]$  ./xgcc -B./ -m32 -O2 /tmp/foo.c -o o2.sse
[EMAIL PROTECTED] gcc]$ time ./o2

real0m7.163s
user0m7.161s
sys 0m0.001s
[EMAIL PROTECTED] gcc]$ time ./core2

real0m7.833s
user0m7.829s
sys 0m0.001s
[EMAIL PROTECTED] gcc]$ time ./o2.sse

real0m7.795s
user0m7.794s
sys 0m0.000s
[EMAIL PROTECTED] gcc]$ time ./core2.sse

real0m7.339s
user0m7.337s
sys 0m0.001s
[EMAIL PROTECTED] gcc]$

But even with this patch, IRA still generates slower codes:

[EMAIL PROTECTED] gcc]$ ./xgcc -B./ -m32 -O2 /tmp/foo.c -o core2.noira 
-mtune=core2
-fno-ira
[EMAIL PROTECTED] gcc]$ time ./core2.noira

real0m7.444s
user0m7.441s
sys 0m0.001s
[EMAIL PROTECTED] gcc]$ ./xgcc -B./ -m32 -O2 /tmp/foo.c -o core2.sse.noira
-mtune=core2 -fno-ira -msse3 -mfpmath=sse
[EMAIL PROTECTED] gcc]$ time ./core2.sse.noira

real0m7.229s
user0m7.224s
sys 0m0.000s
[EMAIL PROTECTED] gcc]$


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948