[Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in phase opt and generate with -ftest-coverage -fprofile-arcs

2015-03-18 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P2


[Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in phase opt and generate with -ftest-coverage -fprofile-arcs

2015-03-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928

--- Comment #16 from Richard Biener rguenth at gcc dot gnu.org ---
Created attachment 34974
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34974action=edit
Patch to limit coalescing amount

The committed patch improves peak memory usage from 7.6GB to 5.8GB for the
small testcase.

The attached patch reduces memory usage from SSA coalescing further (to ~300MB)
by simply doing less coalescing.  Unfortunately the generated RTL puts a bigger
load on CSE/DF and thus we need 7.6GB again (eventually one can find an optimal
--param max-out-of-ssa-coalesce-names, but that's probably highly testcase
specific).

In theory you can iterate on coalescing piecewise as well, but the overhead
for doing this might be too big (basically up to computing live/conflict
for each coalesce pair separately, taking into account previous coalesces).


[Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in phase opt and generate with -ftest-coverage -fprofile-arcs

2015-03-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928

--- Comment #18 from Richard Biener rguenth at gcc dot gnu.org ---
(In reply to Richard Biener from comment #17)
 Created attachment 34975 [details]
 do not compute live/conflict for abnormal coalesces
 
 This is the other idea of simply not computing live/conflict for abnormal
 coalesces we know to always succeed.  This shrinks the following
 live/conflict
 problem for the regular coalesces by unifying some partitions.
 
 Doesn't help this particular testcase much.

But it fixes PR63155 ...


[Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in phase opt and generate with -ftest-coverage -fprofile-arcs

2015-03-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928

--- Comment #15 from Richard Biener rguenth at gcc dot gnu.org ---
Author: rguenth
Date: Fri Mar  6 12:34:28 2015
New Revision: 221237

URL: https://gcc.gnu.org/viewcvs?rev=221237root=gccview=rev
Log:
2015-03-06  Richard Biener  rguent...@suse.de

PR middle-end/64928
* tree-ssa-live.h (struct tree_live_info_d): Add livein_obstack
and liveout_obstack members.
(calculate_live_on_exit): Remove.
(calculate_live_ranges): Change declaration.
* tree-ssa-live.c (liveness_bitmap_obstack): Remove global var.
(new_tree_live_info): Adjust.
(calculate_live_ranges): Delete livein when not wanted.
(calculate_live_ranges): Do not initialize liveness_bitmap_obstack.
Deal with partly deleted live info.
(loe_visit_block): Remove temporary bitmap by using
bitmap_ior_and_compl_into.
(live_worklist): Adjust accordingly.
(calculate_live_on_exit): Make static.
* tree-ssa-coalesce.c (coalesce_ssa_name): Tell calculate_live_ranges
we do not need livein.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-ssa-coalesce.c
trunk/gcc/tree-ssa-live.c
trunk/gcc/tree-ssa-live.h


[Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in phase opt and generate with -ftest-coverage -fprofile-arcs

2015-03-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928

--- Comment #17 from Richard Biener rguenth at gcc dot gnu.org ---
Created attachment 34975
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=34975action=edit
do not compute live/conflict for abnormal coalesces

This is the other idea of simply not computing live/conflict for abnormal
coalesces we know to always succeed.  This shrinks the following live/conflict
problem for the regular coalesces by unifying some partitions.

Doesn't help this particular testcase much.


[Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in phase opt and generate with -ftest-coverage -fprofile-arcs

2015-03-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928

--- Comment #14 from Richard Biener rguenth at gcc dot gnu.org ---
Note that if we fix out-of-SSA coalescing (patch in testing) then RTL CSE
explodes via DF.


[Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in phase opt and generate with -ftest-coverage -fprofile-arcs

2015-03-05 Thread steven at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928

Steven Bosscher steven at gcc dot gnu.org changed:

   What|Removed |Added

 CC||steven at gcc dot gnu.org

--- Comment #12 from Steven Bosscher steven at gcc dot gnu.org ---
(In reply to Richard Biener from comment #9)
 It seems that loop invariant motion is responsible for most of the abnormals,
 thus -fno-tree-loop-im restores performance.
 
 The loop LIM detects is of style
 
   bb 6: (header)
   # ___fp_3(ab) = PHI ___fp_41(4), ___fp_5(21)
   # ___r1_7(ab) = PHI ___r1_42(4), ___r1_9(21)
   # ___r2_11(ab) = PHI ___r2_43(4), ___r3_17(21)
   # ___r3_19(ab) = PHI ___r3_44(4), ___r3_23(21)
   # ___r4_25 = PHI ___r4_45(4), ___r4_26(21)
   # gotovar.17_29 = PHI _51(4), _69(21)
   goto gotovar.17_29;

Perhaps disable LIM (and maybe PRE) if the CFG has a large edge/bb ratio (i.e.
dense CFG)? There's probably no benefit in such cases anyway.


[Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in phase opt and generate with -ftest-coverage -fprofile-arcs

2015-03-05 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928

--- Comment #13 from Jeffrey A. Law law at redhat dot com ---
I think we've done similar things for Brad's large testcases in the past.  You
want to look at both the edge/bb density as well as the overall size.  ie, a
high density doesn't really hurt if the total cfg is small.

See is_too_expensive in gcse.c for the current heuristics to avoid trying
global opts on these kinds of testcases.


[Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in phase opt and generate with -ftest-coverage -fprofile-arcs

2015-03-05 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928

--- Comment #11 from Richard Biener rguenth at gcc dot gnu.org ---
Ok, so it's already calculate_live_ranges that takes much memory.  I have a
small patch to improve that somewhat.

But what we really need is to get the must coalesce stuff coalesced with
respect to both live and conflict computation.  That is, map must-coalesce
SSA vars to the same partition.  That loses the SSA corruption testing, but
well so it might be much more controversical (silent wrong-code instead of
ICE).
Unfortunately in the testcase there are only 2750 must-coalesces but
109493 partitions participating in the coalescing (so at least 5 want
coalesces).

The good news is of course that we can simply choose to _not_ coalesce that
many variables, but say only the important ones.


[Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in phase opt and generate with -ftest-coverage -fprofile-arcs

2015-02-16 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928

Jeffrey A. Law law at redhat dot com changed:

   What|Removed |Added

 CC||law at redhat dot com

--- Comment #10 from Jeffrey A. Law law at redhat dot com ---
Might want to look at 65076 as well where phase opt and generate is taking 89%
of the compile time.  Might be a better testcase to work with.


[Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in phase opt and generate with -ftest-coverage -fprofile-arcs

2015-02-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #9 from Richard Biener rguenth at gcc dot gnu.org ---
It seems that loop invariant motion is responsible for most of the abnormals,
thus -fno-tree-loop-im restores performance.

The loop LIM detects is of style

  bb 6: (header)
  # ___fp_3(ab) = PHI ___fp_41(4), ___fp_5(21)
  # ___r1_7(ab) = PHI ___r1_42(4), ___r1_9(21)
  # ___r2_11(ab) = PHI ___r2_43(4), ___r3_17(21)
  # ___r3_19(ab) = PHI ___r3_44(4), ___r3_23(21)
  # ___r4_25 = PHI ___r4_45(4), ___r4_26(21)
  # gotovar.17_29 = PHI _51(4), _69(21)
  goto gotovar.17_29;

...

  bb 21: (latch)
  _67 = ___pc_1 + 15;
  _68 = (void * *) _67;
  _69 = *_68;
  PROF_edge_counter_142 = __gcov0.___H_object_2d__3e_u8vector[14];
  PROF_edge_counter_143 = PROF_edge_counter_142 + 1;
  __gcov0.___H_object_2d__3e_u8vector[14] = PROF_edge_counter_143;
  goto bb 6;

not sure if we should artificially limit such loops.  LIM doesn't account
for the (compile-time) cost of needing very many PHIs when rewriting
the store-motion vars into SSA form (but it could in theory estimate
by taking into account the CFG structure of the loop).

Let's see if we can first generate a smaller testcase to illustrate the
issue.

Mine for now.


[Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in phase opt and generate with -ftest-coverage -fprofile-arcs

2015-02-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2015-02-09
 Ever confirmed|0   |1

--- Comment #8 from Richard Biener rguenth at gcc dot gnu.org ---
Ok, so the memory is used by out-of-SSA it seems

#5  0x00c9eebc in coalesce_ssa_name ()
at /space/rguenther/src/svn/gcc-4_9-branch/gcc/tree-ssa-coalesce.c:1330
1330  graph = build_ssa_conflict_graph (liveinfo);
(gdb) p *cl-list.htab
$10 = {entries = 0x2b19b30, size = 524287, n_elements = 77146, n_deleted = 0, 
  searches = 122189, collisions = 6508, size_prime_index = 16}

where we malloc(!) 77146 entries of size 12.

But of course bad is the conflict graph with 76063 bitmaps eating up around
1GB of memory for the first testcase (and function
___H__23__23_u8vector_2d__3e_object).

That's likely caused by the change to more aggressively coalesce anonymous
SSA names.


[Bug middle-end/64928] [4.8/4.9/5 Regression] Inordinate cpu time and memory usage in phase opt and generate with -ftest-coverage -fprofile-arcs

2015-02-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64928

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||memory-hog
  Component|other   |middle-end
  Known to work||4.4.7
   Target Milestone|--- |4.8.5
Summary|Inordinate cpu time and |[4.8/4.9/5 Regression]
   |memory usage in phase opt  |Inordinate cpu time and
   |and generate with  |memory usage in phase opt
   |-ftest-coverage |and generate with
   |-fprofile-arcs  |-ftest-coverage
   ||-fprofile-arcs

--- Comment #7 from Richard Biener rguenth at gcc dot gnu.org ---
Given from the description I suppose that non-profiling/coverage mode is fine.