[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code
--- Comment #12 from Joey dot ye at intel dot com 2008-12-10 03:01 --- Fixed at trunk 142631 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948
[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code
--- Comment #13 from hjl dot tools at gmail dot com 2008-12-10 05:02 --- Fixed. -- hjl dot tools at gmail dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948
[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code
--- Comment #11 from steven at gcc dot gnu dot org 2008-12-06 22:05 --- What's the status of this bug? Fixed? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948
[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code
--- Comment #8 from vmakarov at redhat dot com 2008-11-10 16:12 --- H.J., thanks for finding the problem and reducing the test case. The problem could be solved by using extended register coalescing. Now IRA coalesces only move insns (-fira-coalesce). But unfortunately usage of -fira-coalesce makes worse code in general case. Register preferencing based on hard register costs works generally better in IRA. Therefore I've tried to find what is wrong with the hard register cost calculation. Why loading register from its equivalent memory location and 3 usages of the register in the loop of 36.c is cheaper than 1 def and 1 usage of another pseudo-register (that is major difference from the old register allocator). The problem is in usage GENERAL_REGS class to calculate saving from loading pseudo from the equivalent memory instead of pseudo cover class (SSE_REGS). It gives from cost 12 instead of 6 which results in wrong choice for spilling and worse code. I'll send a patch fixing this problem a bit later today. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948
[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code
--- Comment #9 from vmakarov at gcc dot gnu dot org 2008-11-10 23:23 --- Subject: Bug 37948 Author: vmakarov Date: Mon Nov 10 23:21:45 2008 New Revision: 141753 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=141753 Log: 2008-11-07 Vladimir Makarov [EMAIL PROTECTED] PR rtl-optimizations/37948 * ira-int.h (struct ira_allocno_copy): New member constraint_p. (ira_create_copy, ira_add_allocno_copy): New parameter. * ira-conflicts.c (process_regs_for_copy): New parameter. Pass it to ira_add_allocno_copy. (process_reg_shuffles, add_insn_allocno_copies): Pass a new parameter to process_regs_for_copy. (propagate_copies): Pass a new parameter to ira_add_allocno_copy. Fix typo in passing second allocno to ira_add_allocno_copy. * ira-color.c (update_conflict_hard_regno_costs): Use head of coalesced allocnos list. (assign_hard_reg): Ditto. Check that assigned allocnos are not in the graph. (add_ira_allocno_to_bucket): Rename to add_allocno_to_bucket. (add_ira_allocno_to_ordered_bucket): Rename to add_allocno_to_ordered_bucket. (push_ira_allocno_to_stack): Rename to push_allocno_to_stack. Use head of coalesced allocnos list. (push_allocnos_to_stack): Remove calculation of ALLOCNO_TEMP. Check that it is aready calculated. (push_ira_allocno_to_spill): Rename to push_ira_allocno_to_spill. (setup_allocno_left_conflicts_num): Use head of coalesced allocnos list. (coalesce_allocnos): Do extended coalescing too. * ira-emit.c (add_range_and_copies_from_move_list): Pass a new parameter to ira_add_allocno_copy. * ira-build.c (ira_create_copy, ira_add_allocno_copy): Add a new parameter. (print_copy): Print copy origination too. * ira-costs.c (scan_one_insn): Use alloc_pref for load from equivalent memory. Modified: trunk/gcc/ChangeLog trunk/gcc/ira-build.c trunk/gcc/ira-color.c trunk/gcc/ira-conflicts.c trunk/gcc/ira-costs.c trunk/gcc/ira-emit.c trunk/gcc/ira-int.h -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948
[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code
--- Comment #10 from hjl at gcc dot gnu dot org 2008-11-11 00:01 --- Subject: Bug 37948 Author: hjl Date: Mon Nov 10 23:59:57 2008 New Revision: 141756 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=141756 Log: 2008-11-10 H.J. Lu [EMAIL PROTECTED] Backport from mainline: 2008-11-10 Vladimir Makarov [EMAIL PROTECTED] PR rtl-optimizations/37948 * ira-int.h (struct ira_allocno_copy): New member constraint_p. (ira_create_copy, ira_add_allocno_copy): New parameter. * ira-conflicts.c (process_regs_for_copy): New parameter. Pass it to ira_add_allocno_copy. (process_reg_shuffles, add_insn_allocno_copies): Pass a new parameter to process_regs_for_copy. (propagate_copies): Pass a new parameter to ira_add_allocno_copy. Fix typo in passing second allocno to ira_add_allocno_copy. * ira-color.c (update_conflict_hard_regno_costs): Use head of coalesced allocnos list. (assign_hard_reg): Ditto. Check that assigned allocnos are not in the graph. (add_ira_allocno_to_bucket): Rename to add_allocno_to_bucket. (add_ira_allocno_to_ordered_bucket): Rename to add_allocno_to_ordered_bucket. (push_ira_allocno_to_stack): Rename to push_allocno_to_stack. Use head of coalesced allocnos list. (push_allocnos_to_stack): Remove calculation of ALLOCNO_TEMP. Check that it is aready calculated. (push_ira_allocno_to_spill): Rename to push_ira_allocno_to_spill. (setup_allocno_left_conflicts_num): Use head of coalesced allocnos list. (coalesce_allocnos): Do extended coalescing too. * ira-emit.c (add_range_and_copies_from_move_list): Pass a new parameter to ira_add_allocno_copy. * ira-build.c (ira_create_copy, ira_add_allocno_copy): Add a new parameter. (print_copy): Print copy origination too. * ira-costs.c (scan_one_insn): Use alloc_pref for load from equivalent memory. Modified: branches/ira-merge/gcc/ChangeLog.ira branches/ira-merge/gcc/ira-build.c branches/ira-merge/gcc/ira-color.c branches/ira-merge/gcc/ira-conflicts.c branches/ira-merge/gcc/ira-costs.c branches/ira-merge/gcc/ira-emit.c branches/ira-merge/gcc/ira-int.h -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948
[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code
--- Comment #7 from hjl dot tools at gmail dot com 2008-11-04 19:36 --- IRA generates much slower codes: [EMAIL PROTECTED] 37364]$ make /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -O2 -m32 -fno-ira -o noira foo.c /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/gcc/build-x86_64-linux/gcc/ -O2 -m32 -o ira foo.c time ./noira 7.62user 0.01system 0:07.65elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+88minor)pagefaults 0swaps time ./ira 7.81user 0.01system 0:07.83elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+87minor)pagefaults 0swaps [EMAIL PROTECTED] 37364]$ make time ./noira 7.07user 0.01system 0:07.10elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+88minor)pagefaults 0swaps time ./ira 7.96user 0.00system 0:07.97elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+87minor)pagefaults 0swaps [EMAIL PROTECTED] 37364]$ make time ./noira 7.40user 0.00system 0:07.40elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+87minor)pagefaults 0swaps time ./ira 7.81user 0.00system 0:07.82elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+87minor)pagefaults 0swaps [EMAIL PROTECTED] 37364]$ -- hjl dot tools at gmail dot com changed: What|Removed |Added Summary|[4.4 Regression] IRA|[4.4 Regression] IRA |generates slower code for - |generates slower code |mtune=core2 | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948
[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code for -mtune=core2
--- Comment #5 from rguenth at gcc dot gnu dot org 2008-10-30 21:05 --- So, is this a target issue or a register allocator issue now? Has the costs fix been applied? -- rguenth at gcc dot gnu dot org changed: What|Removed |Added Priority|P3 |P2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948
[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code for -mtune=core2
--- Comment #6 from hjl dot tools at gmail dot com 2008-10-30 22:51 --- (In reply to comment #5) So, is this a target issue or a register allocator issue now? Has the costs fix been applied? It is an IRA issue since -fno-ira is still faster with -mtune=generic. IRA should be fixed first before changing Core 2 cost. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948
[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code for -mtune=core2
--- Comment #2 from bonzini at gnu dot org 2008-10-29 07:17 --- Subject: Re: [4.4 Regression] IRA generates slower code for -mtune=core2 hjl dot tools at gmail dot com wrote: --- Comment #1 from hjl dot tools at gmail dot com 2008-10-29 05:44 --- It looks like the cost of loading/storing FP values aren't appropriate for Core 2. With this patch: Good. Is regmove still helping (which would be the wrong thing to do, but gives a data point)? Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948
Re: [Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code for -mtune=core2
Sent from my iPhone On Oct 29, 2008, at 12:17 AM, bonzini at gnu dot org [EMAIL PROTECTED] wrote: --- Comment #2 from bonzini at gnu dot org 2008-10-29 07:17 --- Subject: Re: [4.4 Regression] IRA generates slower code for -mtune=core2 hjl dot tools at gmail dot com wrote: --- Comment #1 from hjl dot tools at gmail dot com 2008-10-29 05:44 --- It looks like the cost of loading/storing FP values aren't appropriate for Core 2. With this patch: Good. Is regmove still helping (which would be the wrong thing to do, but gives a data point)? I noticed that ira ignores ! part of constraint do you know if the register class would change if ! was not ignored? Thanks, Andrew Pinsky Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948
[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code for -mtune=core2
--- Comment #3 from pinskia at gmail dot com 2008-10-29 07:25 --- Subject: Re: [4.4 Regression] IRA generates slower code for -mtune=core2 Sent from my iPhone On Oct 29, 2008, at 12:17 AM, bonzini at gnu dot org [EMAIL PROTECTED] wrote: --- Comment #2 from bonzini at gnu dot org 2008-10-29 07:17 --- Subject: Re: [4.4 Regression] IRA generates slower code for -mtune=core2 hjl dot tools at gmail dot com wrote: --- Comment #1 from hjl dot tools at gmail dot com 2008-10-29 05:44 --- It looks like the cost of loading/storing FP values aren't appropriate for Core 2. With this patch: Good. Is regmove still helping (which would be the wrong thing to do, but gives a data point)? I noticed that ira ignores ! part of constraint do you know if the register class would change if ! was not ignored? Thanks, Andrew Pinsky Paolo -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948
[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code for -mtune=core2
--- Comment #4 from hjl dot tools at gmail dot com 2008-10-29 13:08 --- (In reply to comment #2) Subject: Re: [4.4 Regression] IRA generates slower code for -mtune=core2 hjl dot tools at gmail dot com wrote: --- Comment #1 from hjl dot tools at gmail dot com 2008-10-29 05:44 --- It looks like the cost of loading/storing FP values aren't appropriate for Core 2. With this patch: Good. Is regmove still helping (which would be the wrong thing to do, but gives a data point)? Paolo For this bug, regmove has mixed impacts with updated core2_cost: [EMAIL PROTECTED] regmove]$ ../xgcc -B../ -m32 -O2 /tmp/foo.c -o core2.sse -mtune=core2 -msse3 -mfpmath=sse [EMAIL PROTECTED] regmove]$ ../xgcc -B../ -m32 -O2 /tmp/foo.c -o o2.sse -msse3 -mfpmath=sse [EMAIL PROTECTED] regmove]$ ../xgcc -B../ -m32 -O2 /tmp/foo.c -o core2 -mtune=core2 [EMAIL PROTECTED] regmove]$ ../xgcc -B../ -m32 -O2 /tmp/foo.c -o o2 [EMAIL PROTECTED] regmove]$ time ./o2 real0m7.995s user0m7.956s sys 0m0.003s [EMAIL PROTECTED] regmove]$ time ./core2 real0m7.951s user0m7.950s sys 0m0.000s [EMAIL PROTECTED] regmove]$ time ./core2.sse real0m7.358s user0m7.357s sys 0m0.000s [EMAIL PROTECTED] regmove]$ time ./o2.sse real0m7.177s user0m7.176s sys 0m0.000s [EMAIL PROTECTED] regmove]$ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948
[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code for -mtune=core2
-- rguenth at gcc dot gnu dot org changed: What|Removed |Added Target Milestone|--- |4.4.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948
[Bug rtl-optimization/37948] [4.4 Regression] IRA generates slower code for -mtune=core2
--- Comment #1 from hjl dot tools at gmail dot com 2008-10-29 05:44 --- It looks like the cost of loading/storing FP values aren't appropriate for Core 2. With this patch: [EMAIL PROTECTED] i386]$ diff -up i386.c.foo i386.c --- i386.c.foo 2008-10-28 21:56:19.0 -0700 +++ i386.c 2008-10-28 22:01:53.0 -0700 @@ -990,9 +990,9 @@ struct processor_costs core2_cost = { Relative to reg-reg move (2). */ {4, 4, 4}, /* cost of storing integer registers */ 2, /* cost of reg,reg fld/fst */ - {6, 6, 6}, /* cost of loading fp registers + {12, 12, 12},/* cost of loading fp registers in SFmode, DFmode and XFmode */ - {4, 4, 4}, /* cost of storing fp registers + {6, 6, 8}, /* cost of storing fp registers in SFmode, DFmode and XFmode */ 2, /* cost of moving MMX register */ {6, 6}, /* cost of loading MMX registers @@ -1000,9 +1000,9 @@ struct processor_costs core2_cost = { {4, 4}, /* cost of storing MMX registers in SImode and DImode */ 2, /* cost of moving SSE register */ - {6, 6, 6}, /* cost of loading SSE registers + {8, 8, 8}, /* cost of loading SSE registers in SImode, DImode and TImode */ - {4, 4, 4}, /* cost of storing SSE registers + {8, 8, 8}, /* cost of storing SSE registers in SImode, DImode and TImode */ 2, /* MMX or SSE register to integer */ 32, /* size of l1 cache. */ [EMAIL PROTECTED] i386]$ I got [EMAIL PROTECTED] gcc]$ ./xgcc -B./ -m32 -O2 /tmp/foo.c -o core2.sse -mtune=core2 -msse3 -mfpmath=sse [EMAIL PROTECTED] gcc]$ ./xgcc -B./ -m32 -O2 /tmp/foo.c -o core2 -mtune=core2 [EMAIL PROTECTED] gcc]$ ./xgcc -B./ -m32 -O2 /tmp/foo.c -o o2 -msse3 -mfpmath=sse [EMAIL PROTECTED] gcc]$ ./xgcc -B./ -m32 -O2 /tmp/foo.c -o o2.sse [EMAIL PROTECTED] gcc]$ time ./o2 real0m7.163s user0m7.161s sys 0m0.001s [EMAIL PROTECTED] gcc]$ time ./core2 real0m7.833s user0m7.829s sys 0m0.001s [EMAIL PROTECTED] gcc]$ time ./o2.sse real0m7.795s user0m7.794s sys 0m0.000s [EMAIL PROTECTED] gcc]$ time ./core2.sse real0m7.339s user0m7.337s sys 0m0.001s [EMAIL PROTECTED] gcc]$ But even with this patch, IRA still generates slower codes: [EMAIL PROTECTED] gcc]$ ./xgcc -B./ -m32 -O2 /tmp/foo.c -o core2.noira -mtune=core2 -fno-ira [EMAIL PROTECTED] gcc]$ time ./core2.noira real0m7.444s user0m7.441s sys 0m0.001s [EMAIL PROTECTED] gcc]$ ./xgcc -B./ -m32 -O2 /tmp/foo.c -o core2.sse.noira -mtune=core2 -fno-ira -msse3 -mfpmath=sse [EMAIL PROTECTED] gcc]$ time ./core2.sse.noira real0m7.229s user0m7.224s sys 0m0.000s [EMAIL PROTECTED] gcc]$ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948