[Bug ipa/60243] IPA is slow on large cgraph tree

2024-02-19 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed|2018-11-19 00:00:00 |2024-2-19

--- Comment #28 from Richard Biener  ---
Original testcase at -O2:

 callgraph functions expansion  :  11.94 ( 51%)   2.19 ( 42%)  14.14 ( 49%)
  570M ( 40%)
 callgraph ipa passes   :  10.00 ( 43%)   0.70 ( 13%)  10.70 ( 37%)
  601M ( 42%)
 ipa profile:   4.68 ( 20%)   0.00 (  0%)   4.68 ( 16%)
0  (  0%)
 TOTAL  :  23.36  5.22 28.60   
 1430M
23.36user 5.27system 0:28.65elapsed 99%CPU (0avgtext+0avgdata
1152100maxresident)k
0inputs+0outputs (0major+315833minor)pagefaults 0swaps

Jakubs testcase at -O2:

 callgraph functions expansion  :  12.66 ( 30%)   2.21 ( 16%)  14.87 ( 27%)
  505M ( 15%)
 callgraph ipa passes   :  18.28 ( 44%)   0.65 (  5%)  18.94 ( 34%)
  601M ( 18%)
 ipa profile:   4.20 ( 10%)   0.00 (  0%)   4.20 (  8%)
0  (  0%)
 preprocessing  :   1.47 (  4%)   3.27 ( 24%)   4.81 (  9%)
  417M ( 12%)
 lexical analysis   :   2.24 (  5%)   4.08 ( 30%)   6.34 ( 11%)
0  (  0%)
 early inlining heuristics  :   2.83 (  7%)   0.04 (  0%)   2.97 (  5%)
 1658k (  0%)
 inline parameters  :   3.01 (  7%)   0.21 (  2%)   3.16 (  6%)
   29M (  1%)
 tree CFG construction  :   3.44 (  8%)   0.15 (  1%)   3.52 (  6%)
  599M ( 18%)
 tree operand scan  :   4.47 ( 11%)   0.26 (  2%)   4.80 (  9%)
   93M (  3%)
 TOTAL  :  41.73 13.57 55.32   
 3422M
41.73user 13.67system 0:55.42elapsed 99%CPU (0avgtext+0avgdata
2374596maxresident)k
0inputs+0outputs (0major+536990minor)pagefaults 0swaps

so besides a faster machine still like Honza said in the last comment.

[Bug ipa/60243] IPA is slow on large cgraph tree

2019-11-21 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #27 from Jan Hubicka  ---
profile_estimate issue is still here, inliner and early inliner issues seems
solved. Seems that ipa_profile just orders the nodes for propagation in wrong
way - we propagate from callers to callees while toposorter is for propagation
opoposite way.

operand_scan seems slow too.

Time variable   usr   sys  wall
  GGC
 phase setup:   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)
   1237 kB (  0%)
 phase parsing  :   6.63 (  9%)   6.77 ( 77%)  13.41 ( 17%)
 655497 kB ( 20%)
 phase opt and generate :  64.47 ( 91%)   2.07 ( 23%)  66.57 ( 83%)
2603397 kB ( 80%)
 garbage collection :   0.64 (  1%)   0.00 (  0%)   0.65 (  1%)
  0 kB (  0%)
 dump files :   0.05 (  0%)   0.01 (  0%)   0.04 (  0%)
  0 kB (  0%)
 callgraph construction :   0.91 (  1%)   0.01 (  0%)   0.83 (  1%)
 399235 kB ( 12%)
 callgraph optimization :   0.37 (  1%)   0.00 (  0%)   0.43 (  1%)
  0 kB (  0%)
 callgraph functions expansion  :  15.98 ( 22%)   1.20 ( 14%)  17.18 ( 21%)
 297309 kB (  9%)
 callgraph ipa passes   :  40.57 ( 57%)   0.40 (  5%)  40.99 ( 51%)
 617751 kB ( 19%)
 ipa function summary   :   0.14 (  0%)   0.00 (  0%)   0.14 (  0%)
   1807 kB (  0%)
 ipa dead code removal  :   0.22 (  0%)   0.00 (  0%)   0.24 (  0%)
  0 kB (  0%)
 ipa cp :   0.97 (  1%)   0.03 (  0%)   1.03 (  1%)
 327514 kB ( 10%)
 ipa inlining heuristics:   0.72 (  1%)   0.00 (  0%)   0.63 (  1%)
  84183 kB (  3%)
 ipa function splitting :   0.02 (  0%)   0.00 (  0%)   0.05 (  0%)
  0 kB (  0%)
 ipa various optimizations  :   0.69 (  1%)   0.20 (  2%)   0.89 (  1%)
 128398 kB (  4%)
 ipa reference  :   0.05 (  0%)   0.00 (  0%)   0.05 (  0%)
  0 kB (  0%)
 ipa profile:  18.24 ( 26%)   0.00 (  0%)  18.25 ( 23%)
  0 kB (  0%)
 ipa pure const :   0.45 (  1%)   0.00 (  0%)   0.46 (  1%)
  0 kB (  0%)
 ipa icf:   0.17 (  0%)   0.02 (  0%)   0.17 (  0%)
  0 kB (  0%)
 ipa SRA:   0.21 (  0%)   0.00 (  0%)   0.21 (  0%)
102 kB (  0%)
 ipa free inline summary:   0.03 (  0%)   0.00 (  0%)   0.04 (  0%)
  0 kB (  0%)
 cfg cleanup:   0.00 (  0%)   0.01 (  0%)   0.02 (  0%)
  0 kB (  0%)
 trivially dead code:   0.12 (  0%)   0.03 (  0%)   0.12 (  0%)
  0 kB (  0%)
 df scan insns  :   0.85 (  1%)   0.14 (  2%)   1.28 (  2%)
 46 kB (  0%)
 df multiple defs   :   0.30 (  0%)   0.06 (  1%)   0.31 (  0%)
  0 kB (  0%)
 df reaching defs   :   0.69 (  1%)   0.05 (  1%)   0.63 (  1%)
  0 kB (  0%)
 df live regs   :   0.49 (  1%)   0.02 (  0%)   0.57 (  1%)
  0 kB (  0%)
 df live regs   :   0.19 (  0%)   0.01 (  0%)   0.12 (  0%)
  0 kB (  0%)
 df must-initialized regs   :   0.10 (  0%)   0.00 (  0%)   0.10 (  0%)
  0 kB (  0%)
 df use-def / def-use chains:   0.44 (  1%)   0.05 (  1%)   0.40 (  1%)
  0 kB (  0%)
 df reg dead/unused notes   :   1.35 (  2%)   0.09 (  1%)   1.15 (  1%)
747 kB (  0%) register information   :   0.16 (  0%)   0.00 ( 
0%)   0.18 (  0%)   0 kB (  0%)
 alias analysis :   0.16 (  0%)   0.00 (  0%)   0.11 (  0%)
436 kB (  0%)
 alias stmt walking :   0.49 (  1%)   0.07 (  1%)   0.67 (  1%)
  0 kB (  0%)
 register scan  :   0.04 (  0%)   0.00 (  0%)   0.01 (  0%)
  0 kB (  0%)
 rebuild jump labels:   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
  0 kB (  0%)
 preprocessing  :   2.37 (  3%)   2.37 ( 27%)   4.49 (  6%)
 383477 kB ( 12%)
 lexical analysis   :   1.88 (  3%)   2.13 ( 24%)   4.20 (  5%)
  0 kB (  0%)
 parser (global):   0.01 (  0%)   0.01 (  0%)   0.03 (  0%)
   1442 kB (  0%)
 parser function body   :   2.19 (  3%)   2.26 ( 26%)   4.50 (  6%)
 270577 kB (  8%)
 early inlining heuristics  :   2.80 (  4%)   0.03 (  0%)   2.81 (  4%)
   3076 kB (  0%)
 inline parameters  :   6.43 (  9%)   0.14 (  2%)   6.74 (  8%)
  31127 kB (  1%)
 integration:   0.17 (  0%)   0.00 (  0%)   0.08 (  0%)
   6789 kB (  0%)
 tree gimplify  :   1.01 (  1%)   0.03 (  0%)   1.15 (  1%)
 610970 kB ( 19%)
 tree eh:   0.50 (  1%)   0.03 (  0%)   0.44 (  1%)
  0 kB (  0%)
 tree CFG construction  :   3.50 (  5%)   0.02 (  0%)   3.74 (  5%)
 628087 kB ( 19%)
 tree CFG cleanup   

[Bug ipa/60243] IPA is slow on large cgraph tree

2019-10-07 Thread jamborm at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #26 from Martin Jambor  ---
With new IPA-SRA, the situation has improved quite a bit, see below
where old-ipa-sra is trunk r275981 and new-ipa-sra is trunk r275982
(arrival of new IPA-SRA):

$ /usr/bin/time -f 'real=%e user=%U' taskset -c 0
~/gcc/old-ipa-sra/inst/bin/gcc -O0 -fno-inline -S pr60243.c
real=64.20 user=63.37

$ /usr/bin/time -f 'real=%e user=%U' taskset -c 0
~/gcc/old-ipa-sra/inst/bin/gcc -O1 -fno-inline -S pr60243.c 
real=90.80 user=89.84

$ /usr/bin/time -f 'real=%e user=%U' taskset -c 0
~/gcc/old-ipa-sra/inst/bin/gcc -O2 -S pr60243.c 
real=235.18 user=233.77

$ /usr/bin/time -f 'real=%e user=%U' taskset -c 0
~/gcc/old-ipa-sra/inst/bin/gcc -O2 -fno-inline -S pr60243.c 
real=198.59 user=197.27

$ /usr/bin/time -f 'real=%e user=%U' taskset -c 0
~/gcc/new-ipa-sra/inst/bin/gcc -O2 -S pr60243.c 
real=114.68 user=113.76

$ /usr/bin/time -f 'real=%e user=%U' taskset -c 0
~/gcc/new-ipa-sra/inst/bin/gcc -O2 -fno-inline -S pr60243.c 
real=88.40 user=87.41


$ taskset -c 0 ~/gcc/new-ipa-sra/inst/bin/gcc -O2 -S pr60243.c -ftime-report
(showing only IPA passes and passes taking more than 1% of usr time)
 phase parsing  :   9.57 (  8%)   6.93 ( 75%)  16.51 ( 13%)
 655448 kB ( 20%)
 phase opt and generate : 105.13 ( 92%)   2.34 ( 25%) 107.83 ( 87%)
2619926 kB ( 80%)
 callgraph functions expansion  :  18.05 ( 16%)   1.34 ( 14%)  19.71 ( 16%)
 302442 kB (  9%)
 callgraph ipa passes   :  77.51 ( 68%)   0.50 (  5%)  78.06 ( 63%)
 623696 kB ( 19%)
 ipa function summary   :   0.15 (  0%)   0.01 (  0%)   0.16 (  0%)
   1494 kB (  0%)
 ipa dead code removal  :   0.32 (  0%)   0.00 (  0%)   0.29 (  0%)
  0 kB (  0%)
 ipa cp :   1.10 (  1%)   0.05 (  1%)   1.13 (  1%)
 326688 kB ( 10%)
 ipa inlining heuristics:  17.85 ( 16%)   0.06 (  1%)  17.82 ( 14%)
  83762 kB (  3%)
 ipa function splitting :   0.00 (  0%)   0.00 (  0%)   0.03 (  0%)
  0 kB (  0%)
 ipa various optimizations  :   0.63 (  1%)   0.28 (  3%)   0.96 (  1%)
 131752 kB (  4%)
 ipa reference  :   0.06 (  0%)   0.00 (  0%)   0.06 (  0%)
  0 kB (  0%)
 ipa profile:  14.66 ( 13%)   0.00 (  0%)  14.67 ( 12%)
  0 kB (  0%)
 ipa pure const :   0.36 (  0%)   0.04 (  0%)   0.60 (  0%)
  0 kB (  0%)
 ipa icf:   0.17 (  0%)   0.01 (  0%)   0.19 (  0%)
  0 kB (  0%)
 ipa SRA:   0.21 (  0%)   0.00 (  0%)   0.23 (  0%)
102 kB (  0%)
 ipa free inline summary:   0.05 (  0%)   0.00 (  0%)   0.04 (  0%)
  0 kB (  0%)
 preprocessing  :   4.20 (  4%)   3.31 ( 36%)   7.77 (  6%)
 384133 kB ( 12%)
 lexical analysis   :   2.46 (  2%)   1.80 ( 19%)   3.95 (  3%)
  0 kB (  0%)
 parser function body   :   2.71 (  2%)   1.82 ( 20%)   4.57 (  4%)
 269874 kB (  8%)
 early inlining heuristics  :  12.82 ( 11%)   0.03 (  0%)  12.71 ( 10%)
   4031 kB (  0%)
 inline parameters  :   8.01 (  7%)   0.12 (  1%)   8.27 (  7%)
  30845 kB (  1%)
 tree CFG construction  :   5.23 (  5%)   0.04 (  0%)   5.03 (  4%)
 628095 kB ( 19%)
 tree SSA rewrite   :   3.42 (  3%)   0.02 (  0%)   3.39 (  3%)
  93305 kB (  3%)
 tree operand scan  :  17.53 ( 15%)   0.26 (  3%)  17.77 ( 14%)
  96568 kB (  3%)

Essentially, -O2 -fno-inline is now as fast as -O1 -fno-inline.

[Bug ipa/60243] IPA is slow on large cgraph tree

2018-11-19 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed|2017-11-17 00:00:00 |2018-11-19

--- Comment #25 from Richard Biener  ---
(In reply to Martin Liška from comment #24)
> Can the bug be marked as resolved?

I don't see how.  Jakubs testcase:

 ipa inlining heuristics:  27.66 (  8%)   0.00 (  0%)  27.66 (  8%)
  0 kB (  0%)
 ipa profile:  18.72 (  6%)   0.00 (  0%)  18.71 (  5%)
  0 kB (  0%)
 ipa SRA: 190.05 ( 58%)   1.44 (  9%) 191.77 ( 56%)
 717305 kB ( 22%)
 early inlining heuristics  :  24.01 (  7%)   0.01 (  0%)  24.18 (  7%)
   2357 kB (  0%)
 tree operand scan  :  13.67 (  4%)   0.68 (  4%)  14.12 (  4%)
  95009 kB (  3%)
 TOTAL  : 325.67 16.14343.04   
3319727 kB

so it's all IPA and a little operand scanner.

[Bug ipa/60243] IPA is slow on large cgraph tree

2018-11-19 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #24 from Martin Liška  ---
Can the bug be marked as resolved?

[Bug ipa/60243] IPA is slow on large cgraph tree

2018-09-04 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #23 from Eric Gallager  ---
(In reply to Jan Hubicka from comment #22)
> > The IPA SRA time is all spent in compute_fn_summary via convert_callers.
> > Not sure why that's necessary here?  Martin, in r152368 you reduced those
> > to once-per-caller but obviously if each function calls each other function
> > as in this testcase this is still O(n^2).  Why's the summary not simply
> > recomputed when we process the caller next?  Thus at most N times?
> 
> This is because summary needs to be ready for early inliner to decide whether
> caller is good for inlning or not.  I think we can simply mark it as dirty
> and
> compute on demand from the inliner.
> 
> I also have finally working patches for incremental update of inline summary
> in
> the IPA inliner.
> 

Cool, looking forward to seeing those patches!

[Bug ipa/60243] IPA is slow on large cgraph tree

2018-06-05 Thread hubicka at ucw dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #22 from Jan Hubicka  ---
> The IPA SRA time is all spent in compute_fn_summary via convert_callers.
> Not sure why that's necessary here?  Martin, in r152368 you reduced those
> to once-per-caller but obviously if each function calls each other function
> as in this testcase this is still O(n^2).  Why's the summary not simply
> recomputed when we process the caller next?  Thus at most N times?

This is because summary needs to be ready for early inliner to decide whether
caller is good for inlning or not.  I think we can simply mark it as dirty and
compute on demand from the inliner.

I also have finally working patches for incremental update of inline summary in
the IPA inliner.

Honza

[Bug ipa/60243] IPA is slow on large cgraph tree

2018-06-05 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

Richard Biener  changed:

   What|Removed |Added

 CC||jamborm at gcc dot gnu.org

--- Comment #21 from Richard Biener  ---
Current trunk at -O2 -fno-checking (w/ otherwise checking enabled):

Time variable   usr   sys  wall
  GGC
 phase setup:   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)
   1245 kB (  0%)
 phase parsing  :  16.72 (  6%)  15.15 ( 75%)  31.86 ( 10%)
 612162 kB ( 18%)
 phase opt and generate : 272.51 ( 94%)   5.08 ( 25%) 277.63 ( 90%)
2719266 kB ( 82%)
 ipa inlining heuristics:  31.82 ( 11%)   0.00 (  0%)  31.85 ( 10%)
  0 kB (  0%)
 ipa profile:   9.92 (  3%)   0.00 (  0%)   9.93 (  3%)
  0 kB (  0%)
 ipa SRA: 153.77 ( 53%)   1.81 (  9%) 155.54 ( 50%)
 741949 kB ( 22%)
 early inlining heuristics  :  24.54 (  8%)   0.03 (  0%)  24.65 (  8%)
   2987 kB (  0%)

at -O -g we can also see to my surprise:

 tree CFG construction  :   6.27 (  4%)   0.04 (  0%)   6.28 (  4%)
 628095 kB ( 15%)
 tree operand scan  :   3.78 (  3%)   0.99 (  4%)   5.01 (  3%)
  47597 kB (  1%)
 tree CFG cleanup   :   7.51 (  5%)   0.05 (  0%)   7.71 (  5%)
  0 kB (  0%)

the tree CFG construction time is _entirely_ spent in assign_discriminators!
That's because expand_location is costly and the discriminator_per_locus
hashtable does that all the time.  It's also because the testcase
sits on a single line.  The whole code seems odd to me as well given
it doesn't very well handle trailing or leading UNKNOWN_LOCATION stmts.
I also wonder why it is done at CFG construction time.

The IPA SRA time is all spent in compute_fn_summary via convert_callers.
Not sure why that's necessary here?  Martin, in r152368 you reduced those
to once-per-caller but obviously if each function calls each other function
as in this testcase this is still O(n^2).  Why's the summary not simply
recomputed when we process the caller next?  Thus at most N times?

[Bug ipa/60243] IPA is slow on large cgraph tree

2017-11-20 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #20 from rguenther at suse dot de  ---
On Sun, 19 Nov 2017, hubicka at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243
> 
> --- Comment #19 from Jan Hubicka  ---
> Author: hubicka
> Date: Sun Nov 19 18:55:30 2017
> New Revision: 254934
> 
> URL: https://gcc.gnu.org/viewcvs?rev=254934=gcc=rev
> Log:
> PR ipa/60243
> * tree-inline.c (estimate_num_insns): Set to 1 at least.
> 
> Modified:
> trunk/gcc/ChangeLog
> trunk/gcc/tree-inline.c

While this fixes the new regression the appearant IPA SRA quadraticness
remains.

I'll add the testcase to our "random" set of testcases in the C++ bench.

[Bug ipa/60243] IPA is slow on large cgraph tree

2017-11-19 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #19 from Jan Hubicka  ---
Author: hubicka
Date: Sun Nov 19 18:55:30 2017
New Revision: 254934

URL: https://gcc.gnu.org/viewcvs?rev=254934=gcc=rev
Log:
PR ipa/60243
* tree-inline.c (estimate_num_insns): Set to 1 at least.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-inline.c

[Bug ipa/60243] IPA is slow on large cgraph tree

2017-11-19 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #18 from Jan Hubicka  ---
Returning MIN(1, count) indeed seems like very good idea to me.  We need to
keep those in control :)

I am testing patch for that.

[Bug ipa/60243] IPA is slow on large cgraph tree

2017-11-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #17 from Richard Biener  ---
So, add a comment in the asm to make the testcase test the same as originally
for this PR (seems to peak at ~2GB then).

Execution times (seconds)
 phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
  1182 kB ( 0%) ggc
 phase parsing   :   6.53 ( 3%) usr   7.40 (73%) sys  13.93 ( 6%) wall 
611643 kB (19%) ggc
 phase opt and generate  : 202.70 (97%) usr   2.70 (27%) sys 205.41 (94%) wall
2569108 kB (81%) ggc
 ipa profile :  14.77 ( 7%) usr   0.00 ( 0%) sys  14.77 ( 7%) wall 
 0 kB ( 0%) ggc
 ipa SRA : 127.88 (61%) usr   0.89 ( 9%) sys 129.17 (59%) wall 
619431 kB (19%) ggc
 early inlining heuristics:   3.74 ( 2%) usr   0.00 ( 0%) sys   3.64 ( 2%) wall
   1928 kB ( 0%) ggc
 tree CFG construction   :   8.73 ( 4%) usr   0.05 ( 0%) sys   8.77 ( 4%) wall 
651524 kB (20%) ggc
 tree operand scan   :  10.61 ( 5%) usr   0.33 ( 3%) sys  10.77 ( 5%) wall 
 95009 kB ( 3%) ggc
 scheduling 2:   3.69 ( 2%) usr   0.02 ( 0%) sys   3.80 ( 2%) wall 
   502 kB ( 0%) ggc
 TOTAL : 209.2310.10   219.35   
3181942 kB

[Bug ipa/60243] IPA is slow on large cgraph tree

2017-11-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

Richard Biener  changed:

   What|Removed |Added

   Keywords||memory-hog
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-11-17
 Ever confirmed|0   |1
  Known to fail||7.2.1

--- Comment #16 from Richard Biener  ---
Not by this commit.  Jakubs testcase is still slow in GCC 7 (and uses >28GB
memory - ick, didn't even finish compiling).

We seem to blow up during early inlining here, because we get BBs with
millions of

__asm__ __volatile__("" :  :  : "memory");
__asm__ __volatile__("" :  :  : "memory");
__asm__ __volatile__("" :  :  : "memory");
__asm__ __volatile__("" :  :  : "memory");
__asm__ __volatile__("" :  :  : "memory");
__asm__ __volatile__("" :  :  : "memory");
__asm__ __volatile__("" :  :  : "memory");
__asm__ __volatile__("" :  :  : "memory");
__asm__ __volatile__("" :  :  : "memory");
__asm__ __volatile__("" :  :  : "memory");
...

counting those as zero size probably isn't wise if we don't "optimize"
them during inlining...

This issue likely hides the underlying old issue.

case GIMPLE_ASM:
  {
int count = asm_str_count (gimple_asm_string (as_a  (stmt)));
/* 1000 means infinity. This avoids overflows later
   with very long asm statements.  */
if (count > 1000)
  count = 1000;
return count;
  }

should return MIN (1, count) even if in this case the asm doesn't generate
any code.

[Bug ipa/60243] IPA is slow on large cgraph tree

2017-11-16 Thread egallager at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

Eric Gallager  changed:

   What|Removed |Added

 CC||egallager at gcc dot gnu.org

--- Comment #15 from Eric Gallager  ---
(In reply to Jan Hubicka from comment #14)
> Author: hubicka
> Date: Fri Mar 28 19:50:28 2014
> New Revision: 208916
> 
> URL: http://gcc.gnu.org/viewcvs?rev=208916=gcc=rev
> Log:
>   PR ipa/60243
>   * ipa-inline.c (want_inline_small_function_p): Short circuit large
>   functions; reorganize to make cheap checks first.
>   (inline_small_functions): Do not estimate growth when dumping;
>   it is expensive.
>   * ipa-inline.h (inline_summary): Add min_size.
>   (growth_likely_positive): New function.
>   * ipa-inline-analysis.c (dump_inline_summary): Add min_size.
>   (set_cond_stmt_execution_predicate): Cleanup.
>   (estimate_edge_size_and_time): Compute min_size.
>   (estimate_calls_size_and_time): Likewise.
>   (estimate_node_size_and_time): Likewise.
>   (inline_update_overall_summary): Update min_size.
>   (do_estimate_edge_time): Likewise.
>   (do_estimate_edge_size): Update.
>   (do_estimate_edge_hints): Update.
>   (growth_likely_positive): New function.
> 
> Modified:
> trunk/gcc/ChangeLog
> trunk/gcc/ipa-inline-analysis.c
> trunk/gcc/ipa-inline.c
> trunk/gcc/ipa-inline.h

Did this fix it?

[Bug ipa/60243] IPA is slow on large cgraph tree

2015-06-23 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243
Bug 60243 depends on bug 60315, which changed state.

Bug 60315 Summary: [4.8 Regression] template constructor switch optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60315

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED


[Bug ipa/60243] IPA is slow on large cgraph tree

2014-03-28 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #14 from Jan Hubicka hubicka at gcc dot gnu.org ---
Author: hubicka
Date: Fri Mar 28 19:50:28 2014
New Revision: 208916

URL: http://gcc.gnu.org/viewcvs?rev=208916root=gccview=rev
Log:
PR ipa/60243
* ipa-inline.c (want_inline_small_function_p): Short circuit large
functions; reorganize to make cheap checks first.
(inline_small_functions): Do not estimate growth when dumping;
it is expensive.
* ipa-inline.h (inline_summary): Add min_size.
(growth_likely_positive): New function.
* ipa-inline-analysis.c (dump_inline_summary): Add min_size.
(set_cond_stmt_execution_predicate): Cleanup.
(estimate_edge_size_and_time): Compute min_size.
(estimate_calls_size_and_time): Likewise.
(estimate_node_size_and_time): Likewise.
(inline_update_overall_summary): Update min_size.
(do_estimate_edge_time): Likewise.
(do_estimate_edge_size): Update.
(do_estimate_edge_hints): Update.
(growth_likely_positive): New function.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/ipa-inline-analysis.c
trunk/gcc/ipa-inline.c
trunk/gcc/ipa-inline.h


[Bug ipa/60243] IPA is slow on large cgraph tree

2014-03-25 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #13 from Jan Hubicka hubicka at gcc dot gnu.org ---
BTW, compiled with C++ FE we seem to have important bottleneck in 
linemap_macro_map_lookup


[Bug ipa/60243] IPA is slow on large cgraph tree

2014-03-03 Thread rguenther at suse dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #12 from rguenther at suse dot de rguenther at suse dot de ---
On Sun, 2 Mar 2014, hubicka at gcc dot gnu.org wrote:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243
 
 --- Comment #11 from Jan Hubicka hubicka at gcc dot gnu.org ---
 Created attachment 32244
   -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32244action=edit
 WIP patch
 
 this patch cuts some redundant work on estimating size of functions that will
 be too large to be inlined anyway. Currently inliner spends a lot of time
 compuing properties of these functions (since small and inlinable functions 
 are
 also fast to estimate)
 
 The patch doesn't really save much time building libreoffice/firefox. I will
 experiment with it a bit more.

Does it help PR60315?  That one is even more an excessive example.


[Bug ipa/60243] IPA is slow on large cgraph tree

2014-03-02 Thread hubicka at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #11 from Jan Hubicka hubicka at gcc dot gnu.org ---
Created attachment 32244
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32244action=edit
WIP patch

this patch cuts some redundant work on estimating size of functions that will
be too large to be inlined anyway. Currently inliner spends a lot of time
compuing properties of these functions (since small and inlinable functions are
also fast to estimate)

The patch doesn't really save much time building libreoffice/firefox. I will
experiment with it a bit more.


[Bug ipa/60243] IPA is slow on large cgraph tree

2014-02-19 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #8 from Richard Biener rguenth at gcc dot gnu.org ---
Author: rguenth
Date: Wed Feb 19 09:29:34 2014
New Revision: 207879

URL: http://gcc.gnu.org/viewcvs?rev=207879root=gccview=rev
Log:
2014-02-19  Richard Biener  rguent...@suse.de

PR ipa/60243
* ipa-prop.c: Include stringpool.h and tree-ssanames.h.
(ipa_modify_call_arguments): Emit an argument load explicitely and
preserve virtual SSA form there and for the replacement call.
Do not update SSA form nor free dominance info.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/ipa-prop.c


[Bug ipa/60243] IPA is slow on large cgraph tree

2014-02-19 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #9 from Richard Biener rguenth at gcc dot gnu.org ---
Author: rguenth
Date: Wed Feb 19 14:25:47 2014
New Revision: 207899

URL: http://gcc.gnu.org/viewcvs?rev=207899root=gccview=rev
Log:
2014-02-19  Richard Biener  rguent...@suse.de

PR ipa/60243
* tree-inline.c (estimate_num_insns): Avoid calling cgraph_get_node
for all calls.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-inline.c


[Bug ipa/60243] IPA is slow on large cgraph tree

2014-02-19 Thread hubicka at ucw dot cz
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #10 from Jan Hubicka hubicka at ucw dot cz ---
 --- Comment #3 from Richard Biener rguenth at gcc dot gnu.org ---
 estimate_calls_size_and_time is quite high on the profile - called via
 do_estimate_edge_size it walks callgraph edges O(n^2).  It seems that
 the idea of having a cache is worse than devising an algorithm to
 compute sizes and times for the whole cgraph at once?

Yep, the problem is that they are changing as the inlining progresses, since
we propagate predicates on them on each inline.  I will check the testcase.


[Bug ipa/60243] IPA is slow on large cgraph tree

2014-02-18 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #3 from Richard Biener rguenth at gcc dot gnu.org ---
estimate_calls_size_and_time is quite high on the profile - called via
do_estimate_edge_size it walks callgraph edges O(n^2).  It seems that
the idea of having a cache is worse than devising an algorithm to
compute sizes and times for the whole cgraph at once?

The next high thing on the profile is ipa_propagate_frequency_1 called
from do_estimate_growth (same thing, walks over all call edges again).

The ipa-profile slowness is the same - ipa_propagate_frequency.

The testcase has N cgraph nodes and N^2/2 call edges, so it's quite unusual
of course.


[Bug ipa/60243] IPA is slow on large cgraph tree

2014-02-18 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #4 from Richard Biener rguenth at gcc dot gnu.org ---
Oh, and ipa_profile_generate_summary is dominated by symtab_get_node ()
hashtable lookup ...


[Bug ipa/60243] IPA is slow on large cgraph tree

2014-02-18 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #5 from Richard Biener rguenth at gcc dot gnu.org ---
(In reply to Richard Biener from comment #4)
 Oh, and ipa_profile_generate_summary is dominated by symtab_get_node ()
 hashtable lookup ...

here:

int
estimate_num_insns (gimple stmt, eni_weights *weights)
{
/* Do not special case builtins where we see the body.
   This just confuse inliner.  */
...
else if (!(decl = gimple_call_fndecl (stmt))
 || !(node = cgraph_get_node (decl))
 || node-definition)
  ;

a simple re-org will fix that.  I'll do that.


[Bug ipa/60243] IPA is slow on large cgraph tree

2014-02-18 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #6 from Richard Biener rguenth at gcc dot gnu.org ---
Created attachment 32162
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32162action=edit
patch 1


[Bug ipa/60243] IPA is slow on large cgraph tree

2014-02-18 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #7 from Richard Biener rguenth at gcc dot gnu.org ---
Created attachment 32163
  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=32163action=edit
patch 2


[Bug ipa/60243] IPA is slow on large cgraph tree

2014-02-17 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

--- Comment #1 from Richard Biener rguenth at gcc dot gnu.org ---
-O2 -fno-inline


[Bug ipa/60243] IPA is slow on large cgraph tree

2014-02-17 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60243

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org ---
So:
#define A(n) static void test##n (int);
#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7)
A(n##8) A(n##9)
#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7)
B(n##8) B(n##9)
#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7)
C(n##8) C(n##9)
D(1)
#undef A
#define E(m, n) if (n  m) test##n (i);
#define F(m, n) E(m, n##0) E(m, n##1) E(m, n##2) E(m, n##3) E(m, n##4) E(m,
n##5) E(m, n##6) E(m, n##7) E(m, n##8) E(m, n##9)
#define G(m, n) F(m, n##0) F(m, n##1) F(m, n##2) F(m, n##3) F(m, n##4) F(m,
n##5) F(m, n##6) F(m, n##7) F(m, n##8) F(m, n##9)
#define H(m, n) G(m, n##0) G(m, n##1) G(m, n##2) G(m, n##3) G(m, n##4) G(m,
n##5) G(m, n##6) G(m, n##7) G(m, n##8) G(m, n##9)
#define A(n) \
static void test##n (int i)\
{\
  asm ( : : : memory);\
  H(n, 1)\
}
D(1)

int
main ()
{
  test1000 (5);
  return 0;
}

so that we have something for the testsuite?