[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1

2016-04-08 Thread ppalka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452

--- Comment #15 from Patrick Palka  ---
Author: ppalka
Date: Fri Apr  8 20:17:10 2016
New Revision: 234837

URL: https://gcc.gnu.org/viewcvs?rev=234837=gcc=rev
Log:
Fix PR c++/70590 (error: location references block not in block tree)

gcc/cp/ChangeLog:

PR c++/70590
PR c++/70452
* constexpr.c (cxx_eval_outermost_expression): Call unshare_expr
on the result if it's not a CONSTRUCTOR.

gcc/testsuite/ChangeLog:

PR c++/70590
PR c++/70452
* g++.dg/pr70590.C: New test.
* g++.dg/pr70590-2.C: New test.


Added:
trunk/gcc/testsuite/g++.dg/pr70590-2.C
trunk/gcc/testsuite/g++.dg/pr70590.C
Modified:
trunk/gcc/cp/ChangeLog
trunk/gcc/cp/constexpr.c
trunk/gcc/testsuite/ChangeLog

[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1

2016-04-07 Thread ppalka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452

--- Comment #14 from Patrick Palka  ---
trunk is now at

Execution times (seconds)
 phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall 
  1287 kB ( 1%) ggc
 phase parsing   :   1.28 (100%) usr   0.20 (100%) sys   1.48 (99%)
wall  156462 kB (99%) ggc
 |name lookup:   0.03 ( 2%) usr   0.01 ( 5%) sys   0.04 ( 3%) wall 
81 kB ( 0%) ggc
 |overload resolution:   0.04 ( 3%) usr   0.00 ( 0%) sys   0.05 ( 3%) wall 
   628 kB ( 0%) ggc
 preprocessing   :   0.03 ( 2%) usr   0.04 (20%) sys   0.09 ( 6%) wall 
  2048 kB ( 1%) ggc
 parser (global) :   0.02 ( 2%) usr   0.07 (35%) sys   0.06 ( 4%) wall 
  7735 kB ( 5%) ggc
 parser function body:   1.23 (96%) usr   0.09 (45%) sys   1.33 (89%) wall 
146667 kB (93%) ggc
 TOTAL :   1.28 0.20 1.49
157759 kB


The remaining extra memory and runtime overhead relative to 4.9 is mostly due
to the constexpr_call_table which caches the result of each constexpr call
that's been evaluated.  This table now gets cleared during GC though which in
real programs triggers frequently enough that its size won't be a problem.

[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1

2016-04-07 Thread ppalka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452

--- Comment #13 from Patrick Palka  ---
Author: ppalka
Date: Thu Apr  7 16:12:05 2016
New Revision: 234810

URL: https://gcc.gnu.org/viewcvs?rev=234810=gcc=rev
Log:
Avoid needless unsharing during constexpr evaluation (PR c++/70452)

gcc/cp/ChangeLog:

PR c++/70452
* constexpr.c (find_constructor): New function.
(unshare_constructor): New function.
(cxx_eval_call_expression): Use unshare_constructor instead of
unshare_expr.
(find_array_ctor_elt): Likewise.
(cxx_eval_vec_init_1): Likewise.
(cxx_eval_store_expression): Likewise.
(cxx_eval_constant_expression): Likewise.


Modified:
trunk/gcc/cp/ChangeLog
trunk/gcc/cp/constexpr.c

[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1

2016-04-05 Thread ppalka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452

--- Comment #12 from Patrick Palka  ---
Turns out that a single line of code is responsible for the 50MB increase in
memory usage relative to 4.9, and that's the call to unshare_expr in
cxx_eval_call_expression:

  /* Associate the bindings with the remapped parms.  */
  tree bound = new_call.bindings;
  tree remapped = parms;
  while (bound)
{
  tree oparm = TREE_PURPOSE (bound);
  tree arg = TREE_VALUE (bound);
  gcc_assert (DECL_NAME (remapped) == DECL_NAME (oparm));
  /* Don't share a CONSTRUCTOR that might be changed.  */
  arg = unshare_expr (arg);  // <---
  ctx->values->put (remapped, arg);
  bound = TREE_CHAIN (bound);
  remapped = DECL_CHAIN (remapped);
}

We unshare each function argument unconditionally even though it's only
necessary to do so for CONSTRUCTORs.  This apparently creates a lot of garbage.
 I wonder if guarding the unsharing with "TREE_CODE (arg) == CONSTRUCTOR" would
be sufficient?

[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1

2016-04-05 Thread ppalka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452

--- Comment #11 from Patrick Palka  ---
gcc 5 shows:

Execution times (seconds)
 phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
  1311 kB ( 0%) ggc
 phase parsing   :   2.10 (100%) usr   0.28 (100%) sys   2.37 (100%)
wall  516276 kB (100%) ggc
 |name lookup:   0.03 ( 1%) usr   0.00 ( 0%) sys   0.02 ( 1%) wall 
83 kB ( 0%) ggc
 |overload resolution:   0.02 ( 1%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall 
   628 kB ( 0%) ggc
 preprocessing   :   0.01 ( 0%) usr   0.06 (21%) sys   0.06 ( 3%) wall 
  2048 kB ( 0%) ggc
 parser (global) :   0.04 ( 2%) usr   0.04 (14%) sys   0.09 ( 4%) wall 
 16180 kB ( 3%) ggc
 parser function body:   2.05 (98%) usr   0.18 (64%) sys   2.22 (93%) wall 
498040 kB (96%) ggc
 TOTAL :   2.10 0.28 2.38
517605 kB

trunk now shows:

Execution times (seconds)
 phase setup :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 1%) wall 
  1287 kB ( 1%) ggc
 phase parsing   :   1.31 (100%) usr   0.21 (100%) sys   1.51 (99%)
wall  203649 kB (99%) ggc
 |name lookup:   0.02 ( 2%) usr   0.00 ( 0%) sys   0.03 ( 2%) wall 
81 kB ( 0%) ggc
 |overload resolution:   0.04 ( 3%) usr   0.00 ( 0%) sys   0.04 ( 3%) wall 
   628 kB ( 0%) ggc
 preprocessing   :   0.02 ( 2%) usr   0.08 (38%) sys   0.08 ( 5%) wall 
  2048 kB ( 1%) ggc
 parser (global) :   0.02 ( 2%) usr   0.02 (10%) sys   0.06 ( 4%) wall 
  7735 kB ( 4%) ggc
 parser function body:   1.27 (97%) usr   0.11 (52%) sys   1.37 (90%) wall 
193854 kB (95%) ggc
 TOTAL :   1.31 0.21 1.52
204947 kB

Hmm, peak memory usage is still about 50MB more than 4.9...

[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1

2016-04-05 Thread ppalka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452

--- Comment #10 from Patrick Palka  ---
Author: ppalka
Date: Tue Apr  5 16:40:00 2016
New Revision: 234753

URL: https://gcc.gnu.org/viewcvs?rev=234753=gcc=rev
Log:
Fix PR c++/70452 (regression in C++ parsing performance)

gcc/cp/ChangeLog:

PR c++/70452
* constexpr.c (struct fundef_copy): New struct.
(struct fundef_copies_table_t): New struct.
(fundef_copies_table): New static variable.
(maybe_initialize_fundef_copies_table): New static function.
(get_fundef_copy): New static function.
(save_fundef_copy): New static function.
(cxx_eval_call_expression): Use get_fundef_copy, and
save_fundef_copy.
(constexpr_call_table): Add "deletable" GTY marker.

gcc/testsuite/ChangeLog:

PR c++/70452
* g++.dg/ext/constexpr-vla4.C: New test.


Added:
trunk/gcc/testsuite/g++.dg/ext/constexpr-vla4.C
Modified:
trunk/gcc/cp/ChangeLog
trunk/gcc/cp/constexpr.c
trunk/gcc/testsuite/ChangeLog

[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1

2016-04-04 Thread ppalka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452

--- Comment #9 from Patrick Palka  ---
Author: ppalka
Date: Tue Apr  5 01:20:00 2016
New Revision: 234732

URL: https://gcc.gnu.org/viewcvs?rev=234732=gcc=rev
Log:
Remove class cache_map and use ggc hash_maps instead (PR c++/70452)

gcc/cp/ChangeLog:

PR c++/70452
* cp-tree.h (class cache_map): Remove.
* constexpr.c (cv_cache): Change type to
GTY((deletable)) hash_map *.
(maybe_constant_value): Adjust following the change to cv_cache.
(clear_cv_cache): New static function.
(clear_cv_and_fold_caches): Use it.
* cp-gimplify.c (fold_cache): Change type to
GTY((deletable)) hash_map *.
(clear_fold_cache): Adjust following the change to fold_cache.
(cp_fold): Likewise.


Modified:
trunk/gcc/cp/ChangeLog
trunk/gcc/cp/constexpr.c
trunk/gcc/cp/cp-gimplify.c
trunk/gcc/cp/cp-tree.h

[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1

2016-04-01 Thread ppalka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452

--- Comment #8 from Patrick Palka  ---
So with the patch instead of making ~200k total copies of the same fn (one for
each recursive call) we only make ~15 total copies (the maximum recursion depth
of the function).

[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1

2016-04-01 Thread ppalka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452

--- Comment #7 from Patrick Palka  ---
Created attachment 38155
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38155=edit
patch that reuses the function copies

I attached a small patch (not commented yet) that reduces the runtime (of a
checking compiler) from 3.5s to 2s and reduces GGC memory usage from 550MB to
200MB when compiling the test case in comment #1.

What it does is it maintains a per-FUNCTION_DECL freelist of body/parm/res
copies that were created by copy_fn().  When a constexpr call is finished it
pushes the copied body/parm/res trees to the freelist and before a call is
evaluated it tries to reuse the trees from the freelist, falling back to
copy_fn() if the freelist is empty.

AFAICT the reason we use copy_fn() in the first place is to make recursive
constexpr calls work.  If we didn't copy the function trees then recursive
calls would refer the same VAR/PARM_DECL trees.  In that respect I think this
patch is safe because recursive calls to the same function will still use
distinct trees since all the entries on the freelist are distinct copies.

Does this seem approach sensible?

[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1

2016-03-30 Thread ppalka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452

Patrick Palka  changed:

   What|Removed |Added

 CC||ppalka at gcc dot gnu.org

--- Comment #6 from Patrick Palka  ---
I wonder if the constexpr call context can be re-used when the call is a
recursive tail call.

[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1

2016-03-30 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452

--- Comment #5 from Jakub Jelinek  ---
Indeed, started with r217664.

[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1

2016-03-30 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
It would surprise me if the TREE_BLOCKS mattered much, but I don't see why
anything would need those.  In any case, this probably started with the C++14
constexpr support, maybe it would be better to copy less by copying only when
you change something.

[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1

2016-03-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452

Richard Biener  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org

--- Comment #3 from Richard Biener  ---
Jason, do the constexpr bodies need the BLOCK tree at all or can we simply
re-map them all to NULL?

[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1

2016-03-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-03-30
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Confirmed.  For trunk I get

Samples: 13K of event 'cycles', Event count (approx.): 11326375876  
  4.71%  cc1plus  cc1plus[.] copy_tree_body_r(tree_node**, int*,
voi
  4.49%  cc1plus  cc1plus[.]
cxx_eval_constant_expression(constexpr_
  4.20%  cc1plus  cc1plus[.] ggc_internal_alloc(unsigned long, void
  2.94%  cc1plus  cc1plus[.] remap_type(tree_node*,
copy_body_data*)
  2.69%  cc1plus  cc1plus[.] ggc_set_mark(void const*)
  2.67%  cc1plus  cc1plus[.] gt_ggc_mx_lang_tree_node(void*)
  2.41%  cc1plus  cc1plus[.] walk_tree_1(tree_node**, tree_node*
(*)
  2.29%  cc1plus  cc1plus[.] cxx_eval_call_expression(constexpr_ctx
  2.11%  cc1plus  cc1plus[.] htab_find_slot_with_hash

nothing "obvious" here other than that we're actually copying a lot of stuff
just to evaluate it.

[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1

2016-03-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452

--- Comment #2 from Richard Biener  ---
Hashtable lookups are from location/block remapping done.  I _think_ that if
all the copy_body stuff we do for constexpr evaluation would just "drop"
locations for the copy we'd still be fine constexpr wise but would save
quite a lot of linemap overhead and compile-time (hash-table lookup for
combined locs via set_block for example).

So maybe add a new field to copy_body_data for this purpose.

[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1

2016-03-30 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452

Richard Biener  changed:

   What|Removed |Added

   Keywords||compile-time-hog
   Target Milestone|--- |5.4
Summary|regression in C++ parsing   |[5/6 Regression] Regression
   |performance between 4.9.3   |in C++ parsing performance
   |and 5.3.1   |between 4.9.3 and 5.3.1