[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-10-15 Thread steven at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146



--- Comment #62 from Steven Bosscher steven at gcc dot gnu.org 2012-10-15 
21:32:04 UTC ---

Author: steven

Date: Mon Oct 15 21:31:57 2012

New Revision: 192476



URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=192476

Log:

Backport from trunk (r190222):



PR tree-optimization/54146

* ifcvt.c: Include pointer-set.h.

(cond_move_process_if_block): Change type of then_regs and

else_regs from alloca'd array to pointer_sets.

(check_cond_move_block): Update for this change.

(cond_move_convert_if_block): Likewise.

* Makefile.in: Fix dependencies for ifcvt.o.





Modified:

branches/gcc-4_7-branch/gcc/ChangeLog

branches/gcc-4_7-branch/gcc/Makefile.in

branches/gcc-4_7-branch/gcc/ifcvt.c


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-09-10 Thread hubicka at ucw dot cz
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #61 from Jan Hubicka hubicka at ucw dot cz 2012-09-10 16:04:38 
UTC ---
  I think it would be good to deprecate the flatten attribute...
 
 It still can be useful I think, if only for creating testcases
 with arbitrary large functions ;)

I think it is generally useful and some of the HPC extensions do contain
flatten
equivalent.  I do not see anything bad in compiler blowing up your machine when
you
explicitely ask him to do so via an attribute ;)

Honza


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-09-04 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #60 from Steven Bosscher steven at gcc dot gnu.org 2012-09-04 
10:50:47 UTC ---
Another improvement that can be done for the test case, is to assign
pseudo-register numbers such that the density of the DF_LR and DF_LIVE bitmaps
increases.

For the density I used the following definition here:
  size = (# list elts) * (# bits per elt)
  cardinality = (# bits set)
  density = cardinality / size

The average density we achieve for the DF_LR_IN sets of the largest function is
only 12%, and that number is misleading because there are a few elements with
almost all bits set and many elements with only a few bits set. For instance
there is one bitmap with the following statistics:

size = 50
cardinality = 940
density = 14.7%

but the per-element statistics show:

bits/eltcount
120
219
32
42
1247

so excluding the 7 dense elements (124 bits set out of 128), the density is
only 1.3%!

The dense bits are for a bunch of SRA variables, the sparse ones for ivtmp
variables. The ivtmp variables are live in mostly the same blocks as the SRA
variables (10 blocks where the variables appear in DF_LR_IN of the block)
so the packing is extremely inefficient. This is the cause for most of the
slowness of DF on the test case: the bitmap chains are large and extremely
sparse.


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-09-02 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #59 from Steven Bosscher steven at gcc dot gnu.org 2012-09-02 
22:54:34 UTC ---
FWIW Martin: SRA blows up this test case's register pressure. Compiling with
SRA enabled takes ~900s, but with -fno-tree-sra compile time almost halves.
There are extremely long live ranges for SA.* variables created by SRA.


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-21 Thread rguenther at suse dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #56 from rguenther at suse dot de rguenther at suse dot de 
2012-08-21 07:55:14 UTC ---
On Mon, 20 Aug 2012, steven at gcc dot gnu.org wrote:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146
 
 Steven Bosscher steven at gcc dot gnu.org changed:
 
What|Removed |Added
 
  Status|NEW |RESOLVED
  Resolution||WONTFIX
 
 --- Comment #55 from Steven Bosscher steven at gcc dot gnu.org 2012-08-20 
 19:30:34 UTC ---
 The remaining problem areas are all liveness calculation routines that are
 essentially inherent quadratic problems: DF liveness, IRA liveness, and
 out-of-SSA liveness. 
 
 I think it would be good to deprecate the flatten attribute...

It still can be useful I think, if only for creating testcases
with arbitrary large functions ;)


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-21 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #57 from Richard Guenther rguenth at gcc dot gnu.org 2012-08-21 
13:34:28 UTC ---
Author: rguenth
Date: Tue Aug 21 13:34:19 2012
New Revision: 190562

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190562
Log:
2012-08-21  Richard Guenther  rguent...@suse.de

Backport from mainline
2012-08-16  Richard Guenther  rguent...@suse.de

PR middle-end/54146
* tree-ssa-loop-niter.c (find_loop_niter_by_eval): Free the
exit vector.
* ipa-pure-const.c (analyze_function): Use FOR_EACH_LOOP_BREAK.
* cfgloop.h (FOR_EACH_LOOP_BREAK): Fix.
* tree-ssa-structalias.c (handle_lhs_call): Properly free rhsc.
* tree-ssa-loop-im.c (analyze_memory_references): Adjust.
(tree_ssa_lim_finalize): Free all mem_refs.
* tree-ssa-sccvn.c (extract_and_process_scc_for_name): Free
scc when bailing out.
* modulo-sched.c (sms_schedule): Use FOR_EACH_LOOP_BREAK.
* ira-build.c (loop_with_complex_edge_p): Free loop exit vector.
* graphite-sese-to-poly.c (scop_ivs_can_be_represented): Use
FOR_EACH_LOOP_BREAK.

2012-08-17  Richard Guenther  rguent...@suse.de

* tree-sra.c (modify_function): Free redirect_callers vector.
* ipa-split.c (split_function): Free args_to_pass vector.
* tree-vect-stmts.c (vectorizable_operation): Do not pre-allocate
vec_oprnds.
(new_stmt_vec_info): Do not pre-allocate STMT_VINFO_SAME_ALIGN_REFS.
* tree-vect-slp.c (vect_free_slp_instance): Free the instance.
(vect_analyze_slp_instance): Free everything.
(destroy_bb_vec_info): Free the SLP instances.

2012-08-17  Richard Guenther  rguent...@suse.de

* params.def (integer-share-limit): Decrease from 256 to 251,
add rationale.

2012-08-21  Richard Guenther  rguent...@suse.de

* tree-ssa-loop-im.c (tree_ssa_lim_finalize): Properly free
the affine expansion cache.

Modified:
branches/gcc-4_7-branch/gcc/ChangeLog
branches/gcc-4_7-branch/gcc/cfgloop.h
branches/gcc-4_7-branch/gcc/graphite-sese-to-poly.c
branches/gcc-4_7-branch/gcc/ipa-pure-const.c
branches/gcc-4_7-branch/gcc/ipa-split.c
branches/gcc-4_7-branch/gcc/ira-build.c
branches/gcc-4_7-branch/gcc/modulo-sched.c
branches/gcc-4_7-branch/gcc/params.def
branches/gcc-4_7-branch/gcc/tree-sra.c
branches/gcc-4_7-branch/gcc/tree-ssa-loop-im.c
branches/gcc-4_7-branch/gcc/tree-ssa-loop-niter.c
branches/gcc-4_7-branch/gcc/tree-ssa-sccvn.c
branches/gcc-4_7-branch/gcc/tree-ssa-structalias.c
branches/gcc-4_7-branch/gcc/tree-vect-slp.c
branches/gcc-4_7-branch/gcc/tree-vect-stmts.c


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-21 Thread stevenb.gcc at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #58 from stevenb.gcc at gmail dot com stevenb.gcc at gmail dot 
com 2012-08-21 13:56:27 UTC ---
FWIW, I think all patches addressing parts of this bug are candidates
for back-porting to release branches. They are all almost trivial.


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-20 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

Steven Bosscher steven at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||WONTFIX

--- Comment #55 from Steven Bosscher steven at gcc dot gnu.org 2012-08-20 
19:30:34 UTC ---
The remaining problem areas are all liveness calculation routines that are
essentially inherent quadratic problems: DF liveness, IRA liveness, and
out-of-SSA liveness. 

I think it would be good to deprecate the flatten attribute...


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-17 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #54 from Steven Bosscher steven at gcc dot gnu.org 2012-08-17 
09:42:15 UTC ---
Author: steven
Date: Fri Aug 17 09:42:06 2012
New Revision: 190475

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190475
Log:
PR middle-end/54146
* tree-ssa-loop-im.c (lim_bitmap_obstack): New bitmap_obstack.
(memref_free): Don't free the bitmaps individually here.
(mem_ref_alloc): Allocate the bitmaps on the new bitmap obstack.
(analyze_memory_references): Likewise.
(tree_ssa_lim_initialize): Initialize the new bitmap obstack.
(tree_ssa_lim_finalize): Release it.
* dse.c (dse_bitmap_obstack): New bitmap obstack.
(dse_obstack): New obstack.
(get_group_info): Allocate the bitmaps on the new bitmap obstack.
(dse_step0): Allocate the scratch bitmap on reg_obstack.  Initialize
the new bitmap obstack and normal obstack.  Use XNEWVEC for bb_table.
(record_store): Allocate regs_set on reg_obstack.
(dse_step1): Allocate regs_live on reg_obstack.
(dse_step2_init): Allocate offset_map_n and offset_map_p on the new
obstack.
(dse_step3_scan): Allocate bitmaps on the new bitmap obstack.
(dse_step3): Likewise.
(dse_confluence_0): Likewise.
(dse_confluence_n): Likewise.
(dse_transfer_function): Likewise.
(dse_step7): Destroy the new obstacks, and everything allocated on
them, in one big sweep.
(rest_of_handle_dse): Update.
* cfgexpand.c (stack_var_bitmap_obstack): New bitmap obstack.
(add_stack_var_conflict): Allocate bitmaps on it.
(add_scope_conflicts_1): Likewise.
(add_scope_conflicts): Likewise.
(update_alias_info_with_stack_vars): Likewise.
(init_vars_expansion): Move TREE_USED fiddling expand_used_vars.
Initialize the new bitmap obstack.
(fini_vars_expansion): Release it.
(estimated_stack_frame_size): Use init_vars_expansion to set things up
and always clean up at the end.
(expand_used_vars): Do the TREE_USED trickery here.  Always call
fini_vars_expansion.
* tree-ssa-live.h (struct tree_live_info_d): Make livein and liveout
arrays of bitmap_head to avoid one indirection per bitmap access.
(live_on_entry, live_on_exit, live_var_map, live_merge_and_clear,
make_live_on_entry): Update.
* tree-ssa-live.c (partition_view_bitmap): Don't double-free 'used'.
(liveness_bitmap_obstack): New bitmap obstack.
(remove_unused_locals): Use it to allocate all bitmaps on.  Update
for livein/liveout changes in tree-ssa-live.h.
(delete_tree_live_info): Release the bitmap obstack.
(loe_visit_block, live_worklist, set_var_live_on_entry,
calculate_live_on_exit, dump_live_info): Update.
(calculate_live_ranges): Initialize the bitmap.
* tree-ssa-ter.c (ter_bitmap_obstack): New bitmap obstack.
(new_temp_expr_table): Allocate bitmap on it.
(make_dependent_on_partition, add_to_partition_kill_list,
add_dependence, process_replaceable): Likewise.
(find_replaceable_exprs): Initialize and release the new obstack here.
* df-problems.c (df_lr_add_problem): Allocate persistent bitmap
for out_of_date_transfer_functions on df_bitmap_obstack.
(df_live_add_problem): Likewise.
(df_chain_add_problem): Likewise.
(df_word_lr_add_problem): Likewise.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/cfgexpand.c
trunk/gcc/df-problems.c
trunk/gcc/dse.c
trunk/gcc/tree-ssa-live.c
trunk/gcc/tree-ssa-live.h
trunk/gcc/tree-ssa-loop-im.c
trunk/gcc/tree-ssa-loop-manip.c
trunk/gcc/tree-ssa-ter.c


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-16 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #48 from Steven Bosscher steven at gcc dot gnu.org 2012-08-16 
10:52:20 UTC ---
Author: steven
Date: Thu Aug 16 10:52:14 2012
New Revision: 190442

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190442
Log:
PR middle-end/54146
* tree-flow.h (compute_global_livein): Remove prototype.
* tree-into-ssa.c (compute_global_livein): Remove function.
* tree-ssa-loop-manip.c: Include gimple-pretty-print.h.
(find_sibling_superloop): New function.
(compute_live_loop_exits): New function.
(add_exit_phis_edge): Rename to add_exit_phi.  Do not allow
inserting a PHI in a block that is not a loop exit for VAR.
Add dumping if TDF_DETAILS.
(add_exit_phis_var): Rewrite.
(add_exit_phis): Update.
(get_loops_exits): Rewrite to return an array of per-loop exits
rather than one bitmap with all loop exits.
(find_uses_to_rename_bb): Ignore virtual PHI nodes.
(rewrite_into_loop_closed_ssa): Update.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-flow.h
trunk/gcc/tree-into-ssa.c
trunk/gcc/tree-ssa-loop-manip.c


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-16 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

Richard Guenther rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||memory-hog

--- Comment #49 from Richard Guenther rguenth at gcc dot gnu.org 2012-08-16 
12:10:56 UTC ---
We still use very much memory (full testcase doesn't fit in 4GB ram).  With

...
  checkCGAL::Gmpfr();
  //checkCGAL::Gmpfi();
  //checkCGAL::QuotientCGAL::Gmpz ();
  //checkCGAL::Lazy_exact_ntCGAL::Gmpq ();
  //checkCORE::BigInt();
  //checkCORE::BigRat();
  //checkCORE::BigFloat();
  //checkCORE::Expr();

I see (--enable-gather-detailed-mem-stats):

Kind   Nodes  Bytes
---
decls1116380  181227400
types 535841   90021288
blocks222522   17801760
stmts  457492068872
refs  854577   43178392
exprs2147881   95117920
constants 1401324176820
identifiers737106486480
vecs 2654190  131466648
binfos 481624873432
ssa names 339369   27149520
constructors   26426 634224
random kinds 1885177   73894984
lang_decl kinds   352117   13918872
lang_type kinds482187329136
omp clauses0  0
---
Total10490451  699345748

a lot of memory in TREE_VECs for some reason.

GIMPLE statements
Kind   Stmts  Bytes
---
assignments   316019   30602376
phi nodes  54994   15777472
conditionals   260902504640
everything else   237509   23110772
---
Total 634612   71995260

gimple is lean, so is RTL ;)

Alloc-pool Kind Elt size  Pools  Allocated (elts)Peak
(elts)Leak (elts)
--
live ranges40513   19250760(481269)   10800800(   
270020)  0( 0)
df_scan ref base   56   1026  331010008(   5910893)   14059808(   
251068)  0( 0)
df_scan ref artificial 64   1026   20113600(314275)4239872(
66248)  0( 0)
df_scan ref regular64   1026   66557568(   1039962)4543872(
70998)  0( 0)

are by far the biggest alloc-pool users.

bitmap stats are confusing because they show leaks for bitmaps we free
by releasing their obstack.  DF and PTA bitmaps are largest.

We leak some VECs ...

c-family/c-pragma.c:619 (push_visibility)24: 0.0% 24   
   1: 0.0%
cp/pt.c:471 (maybe_begin_member_template_process 24: 0.0% 24   
   1: 0.0%
function.c:4513 (push_struct_function)   40: 0.0% 40   
   1: 0.0%
vec.c:307 (vec_stack_p_reserve_exact_1)  40: 0.0% 40   
   1: 0.0%
tree-ssa-loop-ivopts.c:3070 (multiplier_allowed_328: 0.0%608   
   3: 0.0%
tree-ssa-loop-ivopts.c:3153 (get_address_cost)  328: 0.0%608   
   3: 0.0%
tree-ssa-sccvn.c:745 (copy_reference_ops_from_re392: 0.0% 806232   
  102098: 4.6%
cfgloop.h:583 (fel_init)480: 0.0%860   
 106: 0.0%
c-family/c-pragma.c:1246 (c_register_pragma_1)  584: 0.0%696   
   4: 0.0%
function.c:156 (push_function_context)  976: 0.0%   1200   
   8: 0.0%
ira.c:3699 (find_moveable_pseudos) 1240: 0.0% 221128   
 513: 0.0%
passes.c:2188 (execute_one_pass)   4360: 0.1% 655320   
   16466: 0.7%
tree-ssa-structalias.c:3870 (handle_lhs_call)  9576: 0.2%  18360   
 133: 0.0%
ipa-ref.c:55 (ipa_record_reference)   60184: 1.1% 327640   
5813: 0.3%
cfgloop.c:1143 (get_loop_exit_edges)  73184: 1.3% 157888   
   62221: 2.8%
tree-into-ssa.c:940 (mark_phi_for_rewrite)   153360: 2.7% 164096   
  17: 0.0%
cfgloop.c:1134 (get_loop_exit_edges) 166592: 3.0% 238712   
   11639: 0.5%
ipa-reference.c:184 (set_reference_optimization_ 180248: 3.2% 248064   
  47: 0.0%
tree-into-ssa.c:321 (get_ssa_name_ann)   627448:11.2% 716496   
  14: 0.0%
tree-ssa-sccvn.c:3657 (extract_and_process_scc_f1246864:22.3%1291960   
  105903: 4.7%
tree-ssa-loop-im.c:1562 (record_mem_ref_loc)1292560:23.1%1392576   
   55465: 2.5%
tree-ssa-loop-im.c:1551 (record_mem_ref_loc)1771800:31.7%3141360   
   52717: 2.4%


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-16 Thread stevenb.gcc at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #50 from stevenb.gcc at gmail dot com stevenb.gcc at gmail dot 
com 2012-08-16 13:55:40 UTC ---
On Thu, Aug 16, 2012 at 2:10 PM, rguenth at gcc dot gnu.org
gcc-bugzi...@gcc.gnu.org wrote:
 bitmap stats are confusing because they show leaks for bitmaps we free
 by releasing their obstack.  DF and PTA bitmaps are largest.

The bitmap obstack stats are not very reliable anyway, I think. I've
been using my own stats for this (with a size_t version of
obstack_memory_used:

+static size_t
+obstack_memory_used2 (struct obstack *h)
+{
+  struct _obstack_chunk* lp;
+  size_t nbytes = 0;
+
+  for (lp = h-chunk; lp != 0; lp = lp-prev)
+{
+  nbytes += (size_t) (lp-limit - (char *) lp);
+}
+  return nbytes;
+}
+

so that also the freed bitmap elements are counted). With that, you
can show that the reg_obstack and bitmap_default_obstack grow up to
several GB that isn't released between passes. An added problem is
that IRA puts its bitmap on its own obstack (as it should, really) but
that means the 3GB of free elements on the reg_obstack and
bitmap_default_obstack remain unused. So on the machine I use for
testing (cfarm gcc17), the memory footprint is reduced by 2GB (~25%)
with this hack:

Index: ira.c
===
--- ira.c   (revision 190442)
+++ ira.c   (working copy)
@@ -4132,6 +4132,12 @@
   int max_regno_before_ira, ira_max_point_before_emit;
   int rebuild_p;

+  /* There shouldn't be anything on these obstacks.  */
+  bitmap_obstack_release (NULL);
+  bitmap_obstack_initialize (NULL);
+  bitmap_obstack_release (reg_obstack);
+  bitmap_obstack_initialize (reg_obstack);
+
   if (flag_caller_saves)
 init_caller_save ();


There is in general a lot of BITMAP_ALLOC(NULL) abuse in the compiler.
I have patches to address the cases in tree-ssa-live.c and dse.c, and
I intend to look at the tree-ssa-ter and cfgexpand cases this weekend.


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-16 Thread rguenther at suse dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #51 from rguenther at suse dot de rguenther at suse dot de 
2012-08-16 14:06:06 UTC ---
On Thu, 16 Aug 2012, stevenb.gcc at gmail dot com wrote:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146
 
 --- Comment #50 from stevenb.gcc at gmail dot com stevenb.gcc at gmail dot 
 com 2012-08-16 13:55:40 UTC ---
 On Thu, Aug 16, 2012 at 2:10 PM, rguenth at gcc dot gnu.org
 gcc-bugzi...@gcc.gnu.org wrote:
  bitmap stats are confusing because they show leaks for bitmaps we free
  by releasing their obstack.  DF and PTA bitmaps are largest.
 
 The bitmap obstack stats are not very reliable anyway, I think. I've
 been using my own stats for this (with a size_t version of
 obstack_memory_used:
 
 +static size_t
 +obstack_memory_used2 (struct obstack *h)
 +{
 +  struct _obstack_chunk* lp;
 +  size_t nbytes = 0;
 +
 +  for (lp = h-chunk; lp != 0; lp = lp-prev)
 +{
 +  nbytes += (size_t) (lp-limit - (char *) lp);
 +}
 +  return nbytes;
 +}
 +
 
 so that also the freed bitmap elements are counted). With that, you
 can show that the reg_obstack and bitmap_default_obstack grow up to
 several GB that isn't released between passes. An added problem is

Hum, I thought we release those obstacks after each pass ...


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-16 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #52 from Richard Guenther rguenth at gcc dot gnu.org 2012-08-16 
14:27:59 UTC ---
Author: rguenth
Date: Thu Aug 16 14:27:51 2012
New Revision: 190445

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190445
Log:
2012-08-16  Richard Guenther  rguent...@suse.de

PR middle-end/54146
* tree-ssa-loop-niter.c (find_loop_niter_by_eval): Free the
exit vector.
* ipa-pure-const.c (analyze_function): Use FOR_EACH_LOOP_BREAK.
* cfgloop.h (FOR_EACH_LOOP_BREAK): Fix.
* tree-ssa-structalias.c (handle_lhs_call): Properly free rhsc.
* tree-into-ssa.c (get_ssa_name_ann): Allocate info only when
needed.
* tree-ssa-loop-im.c (analyze_memory_references): Adjust.
(tree_ssa_lim_finalize): Free all mem_refs.
* tree-ssa-sccvn.c (extract_and_process_scc_for_name): Free
scc when bailing out.
* modulo-sched.c (sms_schedule): Use FOR_EACH_LOOP_BREAK.
* ira-build.c (loop_with_complex_edge_p): Free loop exit vector.
* graphite-sese-to-poly.c (scop_ivs_can_be_represented): Use
FOR_EACH_LOOP_BREAK.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/cfgloop.h
trunk/gcc/graphite-sese-to-poly.c
trunk/gcc/ipa-pure-const.c
trunk/gcc/ira-build.c
trunk/gcc/modulo-sched.c
trunk/gcc/tree-into-ssa.c
trunk/gcc/tree-ssa-loop-im.c
trunk/gcc/tree-ssa-loop-niter.c
trunk/gcc/tree-ssa-sccvn.c
trunk/gcc/tree-ssa-structalias.c


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-16 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

Steven Bosscher steven at gcc dot gnu.org changed:

   What|Removed |Added

  Attachment #28020|0   |1
is obsolete||

--- Comment #53 from Steven Bosscher steven at gcc dot gnu.org 2012-08-16 
23:04:04 UTC ---
Created attachment 28039
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28039
More dedicated obstacks

This, together with Richi's leak plugs from earlier today, brings peak memory
down to 5GB. That's 5GB now, down from 9GB this morning. Not bad for a day's
work.


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-15 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

Steven Bosscher steven at gcc dot gnu.org changed:

   What|Removed |Added

  Attachment #27930|0   |1
is obsolete||
  Attachment #27946|0   |1
is obsolete||
  Attachment #27953|0   |1
is obsolete||
  Attachment #27957|0   |1
is obsolete||
  Attachment #27979|0   |1
is obsolete||

--- Comment #46 from Steven Bosscher steven at gcc dot gnu.org 2012-08-15 
14:33:27 UTC ---
Created attachment 28020
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28020
Faster rewrite_into_loop_closed_ssa

This reduces time spent in rewrite_into_loop_closed_ssa to something too small
to show up in the timevar measurements.


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-15 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #47 from Steven Bosscher steven at gcc dot gnu.org 2012-08-15 
15:07:05 UTC ---
(In reply to comment #46)
 Created attachment 28020 [details]
 Faster rewrite_into_loop_closed_ssa

After this patch, IRA is the only major bottle-neck left, although there are
still a few passes that take more time than I think is reasonable (out-of-SSA
in particular). New top 10 spenders:

 integrated RA   : 227.36 (30%) usr 678602 kB (19%) ggc
 tree SSA incremental:  66.17 ( 9%) usr 151828 kB ( 4%) ggc
 out of ssa  :  43.76 ( 6%) usr608 kB ( 0%) ggc
 tree PTA:  43.06 ( 6%) usr  27466 kB ( 1%) ggc
 df live regs:  35.03 ( 5%) usr  0 kB ( 0%) ggc
 df liveinitialized regs:  34.25 ( 5%) usr  0 kB ( 0%) ggc
 tree SSA rewrite:  18.37 ( 2%) usr  24888 kB ( 1%) ggc
 remove unused locals:  17.97 ( 2%) usr  0 kB ( 0%) ggc
 dominance computation   :  15.06 ( 2%) usr  0 kB ( 0%) ggc
 tree CFG cleanup:  11.79 ( 2%) usr   1123 kB ( 0%) ggc


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-14 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #44 from Richard Guenther rguenth at gcc dot gnu.org 2012-08-14 
12:38:41 UTC ---
Author: rguenth
Date: Tue Aug 14 12:38:32 2012
New Revision: 190382

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190382
Log:
2012-08-14  Richard Guenther  rguent...@suse.de

PR tree-optimization/54146
* tree-ssa-pre.c (do_regular_insertion): Use a VEC
indexed by pred edge index for avail.
(do_partial_partial_insertion): Likewise.
(insert_into_preds_of_block): Adjust.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-ssa-pre.c


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-14 Thread glisse at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #45 from Marc Glisse glisse at gcc dot gnu.org 2012-08-14 
13:20:55 UTC ---
(In reply to comment #11)
  Marc, do you know where the use of the
  flatten attribute comes from in your code?
 Comes from the Eigen library, I'll talk to them about it and see if they can
 comment. They mostly deal with simple number types (double, float), so the
 behavior with heavy types might have been unnoticed.

For the record:
after doing some benchmarks, they have removed the attribute, it didn't improve
performance anymore (though it probably used to in the past).

I have to say I am extremely impressed by all the improvements you guys have
done based on this bad code (which I realize is now just a handy example of big
code with the flatten attribute high enough in the call chain). If you keep at
it, next time I might not even notice there is something wrong with the code
;-)


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-10 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #42 from Steven Bosscher steven at gcc dot gnu.org 2012-08-10 
06:32:37 UTC ---
(In reply to comment #40)
 Quite an achivement that Steven managed to
 chase out all the other cases.

Thanks for the compliment :-)

I'm still working on the rewrite_into_loop_closed_ssa slowness. I will have a
patch ready this weekend.

Next: On to -O2 :-)


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-10 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #43 from Jan Hubicka hubicka at gcc dot gnu.org 2012-08-10 
07:52:31 UTC ---
Author: hubicka
Date: Fri Aug 10 07:52:23 2012
New Revision: 190283

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190283
Log:

PR middle-end/54146
* ipa-inline-transform.c (inline_call): Add UPDATE_OVERALL_SUMMARY
parameter; honnor it.
* ipa-inline.c (recursive_inlining): Update call
of inline_call.
(inline_small_functions): Likewise.
(ipa_inline): Likewise.
(inline_always_inline_functions): Likewise.
(early_inline_small_functions): Likewise.
(flatten_function): Do separate update of summary info.
* ipa-inline.h (inline_update_overall_summary): Declare.
(inline_call): Update.
* ipa-inline-analysis.c (inline_merge_summary): Break out
updating code to ...
(inline_update_overall_summary): Likewise.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/ipa-inline-analysis.c
trunk/gcc/ipa-inline-transform.c
trunk/gcc/ipa-inline.c
trunk/gcc/ipa-inline.h


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-09 Thread hubicka at ucw dot cz
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #37 from Jan Hubicka hubicka at ucw dot cz 2012-08-10 03:45:31 
UTC ---
 
 I suppose it's the old issue that we update fibheap keys along each
 inlining decision - and with flatten there are just very many ... Honza?

Well, I killed most of updating for mainline and at flattening time the keys
are not
even computed.  So it is probably some other bookeping.  I will profile it.

 It also seems we flatten depth-first (thus inline leafs) instead of
 the other way around:
 
   orig_callee = callee;
   inline_call (e, true, NULL, NULL);
   if (e-callee != orig_callee)
 orig_callee-symbol.aux = (void *) node;
   flatten_function (e-callee, early);
   if (e-callee != orig_callee)
 orig_callee-symbol.aux = NULL;
 
 This means we will materialize all intermediate flattenings in
 functions that will not be reclaimed, right?  flattening foo
 should inline everything into foo, but not affect remaining
 callee bodies.

I do not follow here.  Callee is already the inline clone, so we will
not affect other copies of foo...

Honza
 
 Richard.
 
 -- 
 Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
 --- You are receiving this mail because: ---
 You are on the CC list for the bug.


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-09 Thread hubicka at ucw dot cz
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #38 from Jan Hubicka hubicka at ucw dot cz 2012-08-10 03:50:26 
UTC ---
 I don't understand how inline_merge_summary is supposed to work, so I'm going
 to leave that one for Richi and Honza.
Well, it produces inline summary for function that is result of inlininig based
on summaries
of the former functions.  You can't just bypass it for flattening or the
summary
of the function after flatting will be off. Obviously the function does fair
amount of
propagation. I will try to speed it up or limit the bounds. Since the summaries
are of
constant size, we should not explode here.

Honza


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-09 Thread hubicka at ucw dot cz
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #39 from Jan Hubicka hubicka at ucw dot cz 2012-08-10 03:53:46 
UTC ---
 Martin, can you look at comment #14 and the patch?  I think what we want to
 do in flatten_function is before
 
   inline_call (e, true, NULL, NULL);
 
 reset the edge predicates so that inline_merge_summary becomes very cheap.

Well, this will still result in (conservatively) wrong inline summary for
flattened function. I am not sure if flatten attribute should laso mean
do not care about inlining of the flattened function much.
I think we want to handle this generically even if it is harder for
small function inliner to explode here because it does a lot less cascaded
inlining.

 Unfortunately that beast seems to have no early out (but instead uses
 true_predicate () ...).  Can we speed it up for the case where we just
 want fast operation rather than precise accounting of sizes/time in the
 inlined-to caller?

I will look.

Honza


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-09 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #40 from Jan Hubicka hubicka at gcc dot gnu.org 2012-08-10 
04:34:54 UTC ---
OK,
my simple ^C profiling shows:
#14 0x0091f17f in estimate_edge_size_and_time (e=0x7fffb7192d68,
size=optimized out, time=0x7fffc04110b0, prob=1) at
../../gcc/ipa-inline-analysis.c:2183
#15 0x0091f23a in estimate_calls_size_and_time (node=0x7fffb7193af8,
size=0x7fffc04110b4, time=0x7fffc04110b0, possible_truths=4294967294,
known_vals=0x0, 
known_binfos=0x0) at ../../gcc/ipa-inline-analysis.c:2274
#16 0x0091f2d1 in estimate_calls_size_and_time (node=0x7fffb71939c0,
size=0x7fffc04110b4, time=0x7fffc04110b0, possible_truths=4294967294,
known_vals=0x0, 
known_binfos=0x0) at ../../gcc/ipa-inline-analysis.c:2277
#17 0x0091f2d1 in estimate_calls_size_and_time (node=0x7fffb7193888,
size=0x7fffc04110b4, time=0x7fffc04110b0, possible_truths=4294967294,
known_vals=0x0, 
known_binfos=0x0) at ../../gcc/ipa-inline-analysis.c:2277
#18 0x0091f2d1 in estimate_calls_size_and_time (node=0x7fffb7191c30,
size=0x7fffc04110b4, time=0x7fffc04110b0, possible_truths=4294967294,
known_vals=0x0, 
known_binfos=0x0) at ../../gcc/ipa-inline-analysis.c:2277
#19 0x0091f2d1 in estimate_calls_size_and_time (node=0x7fffb7180c30,
size=0x7fffc04110b4, time=0x7fffc04110b0, possible_truths=4294967294,
known_vals=0x0, 
known_binfos=0x0) at ../../gcc/ipa-inline-analysis.c:2277
#20 0x0091f2d1 in estimate_calls_size_and_time (node=0x7fffb7147d68,
size=0x7fffc04110b4, time=0x7fffc04110b0, possible_truths=4294967294,
known_vals=0x0, 
known_binfos=0x0) at ../../gcc/ipa-inline-analysis.c:2277
#21 0x0091f2d1 in estimate_calls_size_and_time (node=0x7fffcbb039c0,
size=0x7fffc04110b4, time=0x7fffc04110b0, possible_truths=4294967294,
known_vals=0x0, 
known_binfos=0x0) at ../../gcc/ipa-inline-analysis.c:2277
#22 0x009232a1 in inline_merge_summary (edge=0x7fffb6fba068) at
../../gcc/ipa-inline-analysis.c:2687
#23 0x00ed0cdf in inline_call (e=0x7fffb6fba068,
update_original=optimized out, new_edges=0x0, overall_size=0x0) at
../../gcc/ipa-inline-transform.c:246
#24 0x00ece3a9 in flatten_function (node=0x7fffb6fb9ea0, early=1
'\001') at ../../gcc/ipa-inline.c:1605
#25 0x00ece3bf in flatten_function (node=0x7fffb6fb4c30, early=1
'\001') at ../../gcc/ipa-inline.c:1608
#26 0x00ece3bf in flatten_function (node=0x7fffb6f87270, early=1
'\001') at ../../gcc/ipa-inline.c:1608
#27 0x00ece3bf in flatten_function (node=0x7fffcbb039c0, early=1
'\001') at ../../gcc/ipa-inline.c:1608
#28 0x00ece616 in early_inliner () at ../../gcc/ipa-inline.c:1881

so the time is actually not spent in the predicate merging logic, but in
computing overall size/time of the function after inline operation is
performed. This is because of following call in inline_merge_summary:

  estimate_calls_size_and_time (to, info-size, info-time,
~(clause_t)(1  predicate_false_condition),
NULL, NULL);

Here we simply recompute the size/time by walking all callees of the function
and since this involve recursive walk of the inlined callees that are many we
hit quadratic time complexity.
Ugly.  One (particularly ugly) workaround would be to make inline_merge_summary
to skip updating after each step of flattening because flattening do not care,
but that is quite a kludge (similar to Steven's patch but with one extra update
on the very end).

Other solution is to work hard enough to update everything incrementally, but
it is not 100% trivial (that is why I simply recompute). 
I actually tought of this case and concluded that if we do so deep inline trees
we will blow up somewhere else.  Quite an achivement that Steven managed to
chase out all the other cases.

Honza


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-09 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #41 from Jan Hubicka hubicka at gcc dot gnu.org 2012-08-10 
05:18:52 UTC ---
Created attachment 27979
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=27979
Path for inliner slowness

Hi,
this is patch I am testing. After some consideration I do not like the
incremental update idea much. It asks for bookkeeping issues and has roundoff
problems so unless it is really necessary I will try to stick with the
recomputation scheme. This is more careful variant of Steven's patch. We skip
the recomputation after each incremental inline and update the info once we are
done with the flattening.

Honza


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-08 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

Steven Bosscher steven at gcc dot gnu.org changed:

   What|Removed |Added

  Component|tree-optimization   |middle-end

--- Comment #31 from Steven Bosscher steven at gcc dot gnu.org 2012-08-08 
06:28:16 UTC ---
Author: steven
Date: Wed Aug  8 06:28:10 2012
New Revision: 190222

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190222
Log:
PR middle-end/54146
* ifcvt.c: Include pointer-set.h.
(cond_move_process_if_block): Change type of then_regs and
else_regs from alloca'd array to pointer_sets.
(check_cond_move_block): Update for this change.
(cond_move_convert_if_block): Likewise.
* Makefile.in: Fix dependencies for ifcvt.o.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/Makefile.in
trunk/gcc/ifcvt.c

--- Comment #32 from Steven Bosscher steven at gcc dot gnu.org 2012-08-08 
06:29:16 UTC ---
Author: steven
Date: Wed Aug  8 06:29:12 2012
New Revision: 190223

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190223
Log:
PR middle-end/54146
* ira.c (init_live_subregs): Take live_subregs_used as a bitmap.
(build_insn_chain): Make live_subregs_used a bitmap.
Use SBITMAP_SIZE to ignore the paradoxical bytes of subregs.
Use sbitmap_free to free the live_subreg sbitmaps.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/ira.c


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-08 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

Steven Bosscher steven at gcc dot gnu.org changed:

   What|Removed |Added

  Component|tree-optimization   |middle-end

--- Comment #32 from Steven Bosscher steven at gcc dot gnu.org 2012-08-08 
06:29:16 UTC ---
Author: steven
Date: Wed Aug  8 06:29:12 2012
New Revision: 190223

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190223
Log:
PR middle-end/54146
* ira.c (init_live_subregs): Take live_subregs_used as a bitmap.
(build_insn_chain): Make live_subregs_used a bitmap.
Use SBITMAP_SIZE to ignore the paradoxical bytes of subregs.
Use sbitmap_free to free the live_subreg sbitmaps.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/ira.c


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-08 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #33 from Richard Guenther rguenth at gcc dot gnu.org 2012-08-08 
08:09:25 UTC ---
(In reply to comment #29)
 Created attachment 27957 [details]
 Do not traverse sibling loops
 
 The idea here is to note that for a nested loop we know for sure that the loop
 exits into a common loop father must all need PHI nodes, because we have have
 loop pre-headers. I'm not sure what the effect of this is on irreducible 
 loops,
 and I haven't tested this much yet. Comments sought...

Nice results.  A few comments - if you rely on preheaders then please assert
in rewrite_into_loop_closed_ssa that loops_state_satisfies_p
(LOOPS_HAVE_PREHEADERS).

Irreducible loops are one loop to cfgloop and to loop-closed SSA form, so
they should be fine.

What do you mean by for a nested loop we know for sure that the loop
exits into a common loop father must all need PHI nodes?  What happens for
nested loops that do not exit into a common loop father?  That is what's
common here?  I think what you want to say is that a loop cannot exit
to a child of a sibling of a father, thus:

  for (;;)
for (;;)
  goto x;
  for (;;)
x:

cannot happen in the sense that loop detection would not detect the loop
nest as written (but had a single irreducible loop)?

I think you should simply move compute_global_livein to its single use
and make it static.


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-08 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #34 from Steven Bosscher steven at gcc dot gnu.org 2012-08-08 
10:10:46 UTC ---
(In reply to comment #33)
 I think you should simply move compute_global_livein to its single use
 and make it static.

Yes, and I need to add the same smarts there as in find_uses_to_rename_use to
look through loops at the same nesting level, because this test case fails:

extern unsigned use (unsigned);
void foo (void)
{
  unsigned i, j;
  do { i = use (0); } while (i);
  do { j = use (0); } while (j);
  if (i) use (j);
}

It fails in check_loop_closed_ssa_use at -O2 but passes at -O2 -fno-tree-vrp,
so VRP is doing something destroying the initially correct loop-closed SSA
form.


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-08 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #35 from Richard Guenther rguenth at gcc dot gnu.org 2012-08-08 
11:49:21 UTC ---
(In reply to comment #34)
 (In reply to comment #33)
  I think you should simply move compute_global_livein to its single use
  and make it static.
 
 Yes, and I need to add the same smarts there as in find_uses_to_rename_use to
 look through loops at the same nesting level, because this test case fails:
 
 extern unsigned use (unsigned);
 void foo (void)
 {
   unsigned i, j;
   do { i = use (0); } while (i);
   do { j = use (0); } while (j);
   if (i) use (j);
 }
 
 It fails in check_loop_closed_ssa_use at -O2 but passes at -O2 -fno-tree-vrp,
 so VRP is doing something destroying the initially correct loop-closed SSA
 form.

Jump threading I suppose.  I think you can drop loop-closed SSA form
right before calling vrp_finalize ().


[Bug middle-end/54146] Very slow compile with attribute((flatten))

2012-08-08 Thread steven at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54146

--- Comment #36 from Steven Bosscher steven at gcc dot gnu.org 2012-08-08 
17:39:49 UTC ---
Author: steven
Date: Wed Aug  8 17:39:46 2012
New Revision: 190235

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=190235
Log:
PR middle-end/54146
* gimpify.c (gimplify_body): Only verify_gimple_in_seq with
checking enabled.
* tree-ssa-loop-manip.c (add_exit_phis_var): Assert that var is
a gimple_reg if checking is enabled.
(find_uses_to_rename_stmt): Only look at non-virtual USE operands.
* tree-into-ssa (compute_global_livein): Change the worklist
type from an array to a VEC.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/gimplify.c
trunk/gcc/tree-into-ssa.c
trunk/gcc/tree-ssa-loop-manip.c