[Bug tree-optimization/31090] Revision 121302 causes 30% performance regression

2007-06-23 Thread ubizjak at gmail dot com


--- Comment #15 from ubizjak at gmail dot com  2007-06-23 10:00 ---
(In reply to comment #14)
 Subject: Bug 31090
 
 Author: dnovillo
 Date: Wed Apr 11 17:14:06 2007
 New Revision: 123719
 

HJ, what is the situation w.r.t performance regression after the above patch
was committed to SVN?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31090



[Bug tree-optimization/31090] Revision 121302 causes 30% performance regression

2007-06-23 Thread hjl at lucon dot org


--- Comment #16 from hjl at lucon dot org  2007-06-23 14:46 ---
 HJ, what is the situation w.r.t performance regression after the above patch
 was committed to SVN?
 

I think it still needs tuning. I got the following on Linux/Intel64:

  (r125740 - r121297)/r121297
410.bwaves   -0.869565%
416.gamess   -0.574713%
433.milc -0.840336%
434.zeusmp   3.37838%
435.gromacs  -0.214823%
436.cactusADM-0.917431%
437.leslie3d -11.9048%
444.namd -0.671141%
447.dealII   -2.33463%
450.soplex   -0.675676%
453.povray   2.51256%
454.calculix 25.8675%
459.GemsFDTD 0%
465.tonto2.14286%
470.lbm  -2.75862%
481.wrf  5.94059%
482.sphinx3  0.510204%
SPECfp_base2006  0.757576%


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31090



[Bug tree-optimization/31090] Revision 121302 causes 30% performance regression

2007-04-11 Thread dnovillo at gcc dot gnu dot org


--- Comment #14 from dnovillo at gcc dot gnu dot org  2007-04-11 17:14 
---
Subject: Bug 31090

Author: dnovillo
Date: Wed Apr 11 17:14:06 2007
New Revision: 123719

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=123719
Log:

PR 30735
PR 31090
* doc/invoke.texi: Document --params max-aliased-vops and
avg-aliased-vops.
* tree-ssa-operands.h (get_mpt_for, dump_memory_partitions,
debug_memory_partitions): Move to tree-flow.h
* params.h (AVG_ALIASED_VOPS): Define.
* tree-ssa-alias.c (struct mp_info_def): Remove.  Update all
users.
(mp_info_t): Likewise.
(get_mem_sym_stats_for): New.
(set_memory_partition): Move from tree-flow-inline.h.
(mark_non_addressable): Only clear the set of symbols for the
partition if it exists.
(dump_memory_partitions): Move from tree-ssa-operands.c
(debug_memory_partitions): Likewise.
(need_to_partition_p): New.
(dump_mem_ref_stats): New.
(debug_mem_ref_stats): New.
(dump_mem_sym_stats): New.
(debug_mem_sym_stats): New.
(update_mem_sym_stats_from_stmt): New.
(compare_mp_info_entries): New.
(mp_info_cmp): Call it.
(sort_mp_info): Change argument to a list of mem_sym_stats_t
objects.
(get_mpt_for): Move from tree-ssa-operands.c.
(find_partition_for): New.
(create_partition_for): Remove.
(estimate_vop_reduction): New.
(update_reference_counts): New.
(build_mp_info): New.
(compute_memory_partitions): Refactor.
Document new heuristic.
Call build_mp_info, update_reference_counts,
find_partition_for and estimate_vop_reduction.
(compute_may_aliases): Populate virtual operands before
calling debugging dumps.
(delete_mem_sym_stats): New.
(delete_mem_ref_stats): New.
(init_mem_ref_stats): New.
(init_alias_info): Call it.
(maybe_create_global_var): Remove alias_info argument.
Get number of call sites and number of pure/const call sites
from gimple_mem_ref_stats().
(dump_alias_info): Call dump_memory_partitions first.
(dump_points_to_info_for): Show how many times a pointer has
been dereferenced.
* opts.c (decode_options): For -O2 set --param
max-aliased-vops to 500.
For -O3 set --param max-aliased-vops to 1000 and --param
avg-aliased-vops to 3.
* fortran/options.c (gfc_init_options): Remove assignment to
MAX_ALIASED_VOPS.
* tree-flow-inline.h (gimple_mem_ref_stats): New.
* tree-dfa.c (dump_variable): Dump memory reference
statistics.
Dump NO_ALIAS* settings.
(referenced_var_lookup): Tidy.
(mem_sym_stats): New.
* tree-ssa-copy.c (may_propagate_copy): Return true if DEST
and ORIG are different SSA names for a memory partition.
* tree-ssa.c (delete_tree_ssa): Call delete_mem_ref_stats.
* tree-flow.h (struct mem_sym_stats_d): Define.
(mem_sym_stats_t): Define.
(struct mem_ref_stats_d): Define.
(struct gimple_df): Add field mem_ref_stats.
(enum noalias_state): Define.
(struct var_ann_d): Add bitfield noalias_state.
(mem_sym_stats, delete_mem_ref_stats, dump_mem_ref_stats,
debug_mem_ref_stats, debug_memory_partitions,
debug_mem_sym_stats): Declare.
* tree-ssa-structalias.c (update_alias_info): Update call
sites, pure/const call sites and asm sites in structure
returned by gimple_mem_ref_stats.
Remove local variable IS_POTENTIAL_DEREF.
Increase NUM_DEREFS if the memory expression is a potential
dereference.
Call update_mem_sym_stats_from_stmt.
If the memory references memory, call
update_mem_sym_stats_from_stmt for all the direct memory
symbol references found.
(intra_create_variable_infos): Set noalias_state field for
pointer arguments according to the value of
flag_argument_noalias.
* tree-ssa-structalias.h (struct alias_info): Remove fields
num_calls_found and num_pure_const_calls_found.
(update_mem_sym_stats_from_stmt): Declare.
* params.def (PARAM_MAX_ALIASED_VOPS): Change description.
Set default value to 100.
(PARAM_AVG_ALIASED_VOPS): Define.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/doc/invoke.texi
trunk/gcc/fortran/options.c
trunk/gcc/opts.c
trunk/gcc/params.def
trunk/gcc/params.h
trunk/gcc/tree-dfa.c
trunk/gcc/tree-flow-inline.h
trunk/gcc/tree-flow.h
trunk/gcc/tree-ssa-alias.c
trunk/gcc/tree-ssa-copy.c
trunk/gcc/tree-ssa-operands.c
trunk/gcc/tree-ssa-operands.h
trunk/gcc/tree-ssa-structalias.c
trunk/gcc/tree-ssa-structalias.h
trunk/gcc/tree-ssa.c
trunk/gcc/tree-vrp.c


-- 



[Bug tree-optimization/31090] Revision 121302 causes 30% performance regression

2007-03-09 Thread rguenth at gcc dot gnu dot org


--- Comment #2 from rguenth at gcc dot gnu dot org  2007-03-09 10:32 ---
It looks like a no-op change.  For reference:

--- ChangeLog   (revision 121301)
+++ ChangeLog   (revision 121302)
@@ -1,3 +1,41 @@
+2007-01-28  Daniel Berlin  [EMAIL PROTECTED]
+
+   * tree.h (struct tree_memory_tag): Add aliases member.
+   (MTAG_ALIASES): New macro.
+   * tree-ssa-alias.c (alias_bitmap_obstack): New variable.
+   (add_may_alias): Remove pointer-set. Update for may_aliases being
+   a bitmap. 
+   (mark_aliases_call_clobbered): Update for may_aliases being a
+   bitmap.
+   (compute_tag_properties): Ditto.
+   (create_partition_for): Ditto.
+   (compute_memory_partitions): Ditto.
+   (dump_may_aliases_for): Ditto.
+   (is_aliased_with): Ditto.
+   (add_may_alias_for_new_tag): Ditto.
+   (rewrite_alias_set_for): Rewrite for may_aliases being a bitmap.
+   (compute_is_aliased): New function.
+   (compute_may_aliases): Call compute_is_aliased).
+   (init_alias_info): Initialize alias_bitmap_obstack.
+   (union_alias_set_into): New function.
+   (compute_flow_sensitive_aliasing): Use union_aliases_into.
+   (have_common_aliases_p): Rewrite to take two bitmaps and use
+   intersection.
+   (compute_flow_insensitive_aliasing): Stop using pointer-sets.
+   Update for bitmaps.
+   (finalize_ref_all_pointers): Update for add_may_alias changes.
+   (new_type_alias): Ditto.
+   * tree-flow-inline.h (may_aliases): Return a bitmap.
+   * tree-dfa.c (dump_variable): Check for MTAG_P'ness.
+   * tree-ssa.c (verify_flow_insensitive_alias_info): Update for
+   may_aliases being a bitmap.
+   * tree-flow.h (struct var_ann_d): Remove may_aliases member.
+   may_aliases now returns a bitmap.
+   * tree-ssa-structalias.c (merge_smts_into): Update for may_aliases
+   being a bitmap.
+   * tree-ssa-operands.c (add_virtual_operand): Update for
+   may_aliases being a bitmap.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31090



[Bug tree-optimization/31090] Revision 121302 causes 30% performance regression

2007-03-09 Thread hjl at lucon dot org


--- Comment #3 from hjl at lucon dot org  2007-03-09 17:11 ---
Gcc 4.3 revision 122738 has the same issue.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31090



[Bug tree-optimization/31090] Revision 121302 causes 30% performance regression

2007-03-09 Thread hjl at lucon dot org


--- Comment #4 from hjl at lucon dot org  2007-03-09 17:40 ---
Created an attachment (id=13180)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13180action=view)
A breakdown testcase

xxx.f90 is the problem. -fdump-tree-all shows that revision 121302
generates very different alias info.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31090



[Bug tree-optimization/31090] Revision 121302 causes 30% performance regression

2007-03-09 Thread dberlin at dberlin dot org


--- Comment #5 from dberlin at gcc dot gnu dot org  2007-03-09 17:51 ---
Subject: Re:  Revision 121302 causes 30% performance regression

Could you attach dumps for fdump-tree-alias-vops-details-blocks-stats
(tarr'ing up the resulting dumps is fine) for before and after?

I don't have a clean tree around, and it would really help.
(If not, it's okay, it would just make my analysis faster).

On 9 Mar 2007 17:40:50 -, hjl at lucon dot org
[EMAIL PROTECTED] wrote:


 --- Comment #4 from hjl at lucon dot org  2007-03-09 17:40 ---
 Created an attachment (id=13180)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13180action=view)
  -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13180action=view)
 A breakdown testcase

 xxx.f90 is the problem. -fdump-tree-all shows that revision 121302
 generates very different alias info.


 --


 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31090

 --- You are receiving this mail because: ---
 You are on the CC list for the bug, or are watching someone who is.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31090



[Bug tree-optimization/31090] Revision 121302 causes 30% performance regression

2007-03-09 Thread hjl at lucon dot org


--- Comment #6 from hjl at lucon dot org  2007-03-09 17:57 ---
Created an attachment (id=13181)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13181action=view)
Dumps of -fdump-tree-alias-vops-details-blocks-stats

121301 is from revision 121301 and 121302 is from revision 121302.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31090



[Bug tree-optimization/31090] Revision 121302 causes 30% performance regression

2007-03-09 Thread rguenth at gcc dot gnu dot org


--- Comment #7 from rguenth at gcc dot gnu dot org  2007-03-09 23:00 ---
The obvoious difference is more precise alias information:

-bar: Maximum number of VOPS needed per statement: 80
+bar: Maximum number of VOPS needed per statement: 71

(and the resulting different partitioning).

What may cause performance problems are changes like

-  # VUSE MPT.120_683
+  # VUSE SMT.109_703, MPT.120_704
   D.1662_631 = (*D.1657_625)[D.1661_630];

-  # MPT.120_690 = VDEF MPT.120_683
+  # SMT.109_711 = VDEF SMT.109_703
+  # MPT.120_712 = VDEF MPT.120_704
   (*D.1634_608)[D.1644_621] = D.1666_635;

as our tree memory optimizers don't like statements with multiple virtual
operands (at least store_ccp and store_copyprop don't, maybe PRE doesn't care,
I'm also not sure about LIM).  Other than that, more precise alias information
can cause more register pressure, too.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31090



Re: [Bug tree-optimization/31090] Revision 121302 causes 30% performance regression

2007-03-09 Thread Andrew Pinski

On 9 Mar 2007 23:00:55 -, rguenth at gcc dot gnu dot org
[EMAIL PROTECTED] wrote:


 Other than that, more precise alias information
can cause more register pressure, too.


Fix the register allocator instead of complaining about this issue.  I
am sorry but if people want a compiler which works for x86, they just
need to fix the RA.  Now I could care less about x86.

-- Pinski


[Bug tree-optimization/31090] Revision 121302 causes 30% performance regression

2007-03-09 Thread pinskia at gmail dot com


--- Comment #8 from pinskia at gmail dot com  2007-03-09 23:05 ---
Subject: Re:  Revision 121302 causes 30% performance regression

On 9 Mar 2007 23:00:55 -, rguenth at gcc dot gnu dot org
[EMAIL PROTECTED] wrote:

  Other than that, more precise alias information
 can cause more register pressure, too.

Fix the register allocator instead of complaining about this issue.  I
am sorry but if people want a compiler which works for x86, they just
need to fix the RA.  Now I could care less about x86.

-- Pinski


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31090



[Bug tree-optimization/31090] Revision 121302 causes 30% performance regression

2007-03-09 Thread rguenth at gcc dot gnu dot org


--- Comment #9 from rguenth at gcc dot gnu dot org  2007-03-09 23:17 ---
I didn't complain about register pressure.  You need to get a thicker skin.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31090



[Bug tree-optimization/31090] Revision 121302 causes 30% performance regression

2007-03-09 Thread dberlin at gcc dot gnu dot org


--- Comment #10 from dberlin at gcc dot gnu dot org  2007-03-09 23:48 
---
(In reply to comment #7)
 The obvoious difference is more precise alias information:
 
 -bar: Maximum number of VOPS needed per statement: 80
 +bar: Maximum number of VOPS needed per statement: 71

See, it was buggy before and still getting dup'd symbols into the points-to
sets.

In particular:
 80  NMT.114, UID 1780, real8[0:], is addressable, is global, call clobbered (,
is global var ), may aliases: { dyinv dxinv dzinv epsinv SMT.109 dyinv dxinv
dzinv epsinv SMT.109 dyinv dxinv dzinv epsinv SMT.109 dyinv dxinv dzinv epsinv
SMT.109 dyinv dxinv dzinv epsinv SMT.109 dyinv dxinv dzinv epsinv SMT.109 dyinv
dxinv dzinv epsinv SMT.109 dyinv dxinv dzinv epsinv SMT.109 dyinv dxinv dzinv
epsinv SMT.109 dyinv dxinv dzinv epsinv SMT.109 dyinv dxinv dzinv epsinv
SMT.109 dyinv dxinv dzinv epsinv SMT.109 dyinv dxinv dzinv epsinv SMT.109 dyinv
dxinv dzinv epsinv SMT.109 dyinv dxinv dzinv epsinv SMT.109 dyinv dxinv dzinv
epsinv SMT.109 }

Note it's the same set of symbols repeated 14 times
vs

 5  NMT.114, UID 1780, real8[0:], is addressable, is global, call clobbered (,
is global var ), may aliases: { dyinv dxinv dzinv epsinv SMT.109 }
 

This caused different partitioning, and this caused your performance drop.
Sorry, nothing i can do about this, there is no bug here (in fact, it just
shows that the patch removed a bug)
You need Diego to look at the partitioning algorithms to get your performance
back.


-- 

dberlin at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||dnovillo at gcc dot gnu dot
   ||org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31090



[Bug tree-optimization/31090] Revision 121302 causes 30% performance regression

2007-03-09 Thread dnovillo at gcc dot gnu dot org


--- Comment #11 from dnovillo at gcc dot gnu dot org  2007-03-09 23:53 
---

I'm already handling this family of performance problems.  I need a few more
days to finish the WIP patch I have.  In the meantime, see if increasing
--param max-aliased-vops works around the problem.


-- 

dnovillo at gcc dot gnu dot org changed:

   What|Removed |Added

  BugsThisDependsOn||30735


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31090



[Bug tree-optimization/31090] Revision 121302 causes 30% performance regression

2007-03-09 Thread hjl at lucon dot org


--- Comment #12 from hjl at lucon dot org  2007-03-10 00:04 ---
--param max-aliased-vops=100 works:

[EMAIL PROTECTED] 597]$ make
/usr/gcc-next/bin/gfortran -c -O2 --param max-aliased-vops=100 -o 301.o
test597.f90
/usr/gcc-4.3/bin/gfortran -o 301 301.o -Wl,-rpath,/usr/gcc-4.3/lib64
/usr/gcc-last/bin/gfortran -c -O2 --param max-aliased-vops=100 -o 302.o
test597.f90
/usr/gcc-4.3/bin/gfortran -o 302 302.o -Wl,-rpath,/usr/gcc-4.3/lib64
/usr/gcc-last/bin/gfortran -c -O2 --param max-aliased-vops=100 -o 43.o
test597.f90
/usr/gcc-4.3/bin/gfortran -o 43 43.o -Wl,-rpath,/usr/gcc-4.3/lib64
time ./301
0.86user 0.00system 0:00.87elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2326minor)pagefaults 0swaps
time ./302
0.86user 0.00system 0:00.87elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2326minor)pagefaults 0swaps
time ./43
0.86user 0.00system 0:00.87elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2327minor)pagefaults 0swaps
[EMAIL PROTECTED] 597]$ 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31090



[Bug tree-optimization/31090] Revision 121302 causes 30% performance regression

2007-03-09 Thread dnovillo at redhat dot com


--- Comment #13 from dnovillo at redhat dot com  2007-03-10 00:07 ---
Subject: Re:  Revision 121302 causes 30% performance
 regression

hjl at lucon dot org wrote on 03/09/07 19:04:

 --param max-aliased-vops=100 works:
 
OK, thanks.  I'll add this PR to the mix then.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31090



[Bug tree-optimization/31090] Revision 121302 causes 30% performance regression

2007-03-08 Thread hjl at lucon dot org


--- Comment #1 from hjl at lucon dot org  2007-03-08 20:04 ---
Created an attachment (id=13173)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13173action=view)
A testcase

/usr/gcc-next/bin/gfortran -c -O2 -o 301.o test597.f90
/usr/gcc-next/bin/gfortran -o 301 301.o -Wl,-rpath,/usr/gcc-4.3/lib64
/usr/gcc-last/bin/gfortran -c -O2 -o 302.o test597.f90
/usr/gcc-next/bin/gfortran -o 302 302.o -Wl,-rpath,/usr/gcc-4.3/lib64
time ./301
0.92user 0.00system 0:00.93elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2325minor)pagefaults 0swaps
time ./302
1.24user 0.00system 0:01.24elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2325minor)pagefaults 0swaps
[EMAIL PROTECTED] 597]$ /usr/gcc-next/bin/gfortran -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: /net/gnu-13/export/gnu/src/gcc-next/gcc/configure
--enable-clocale=gnu --with-system-zlib --enable-decimal-float=yes
--with-demangler-in-ld --enable-languages=c,fortran --enable-shared
--enable-threads=posix --enable-haifa --enable-checking=assert
--prefix=/usr/gcc-next --with-local-prefix=/usr/local
Thread model: posix
gcc version 4.3.0 20070129 (experimental) [trunk revision 121301]
[EMAIL PROTECTED] 597]$ /usr/gcc-last/bin/gfortran -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: /net/gnu-13/export/gnu/src/gcc-last/gcc/configure
--enable-clocale=gnu --with-system-zlib --enable-decimal-float=yes
--with-demangler-in-ld --enable-languages=c,fortran --enable-shared
--enable-threads=posix --enable-haifa --enable-checking=assert
--prefix=/usr/gcc-last --with-local-prefix=/usr/local
Thread model: posix
gcc version 4.3.0 20070129 (experimental) [trunk revision 121302]


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31090