[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452 --- Comment #15 from Patrick Palka --- Author: ppalka Date: Fri Apr 8 20:17:10 2016 New Revision: 234837 URL: https://gcc.gnu.org/viewcvs?rev=234837=gcc=rev Log: Fix PR c++/70590 (error: location references block not in block tree) gcc/cp/ChangeLog: PR c++/70590 PR c++/70452 * constexpr.c (cxx_eval_outermost_expression): Call unshare_expr on the result if it's not a CONSTRUCTOR. gcc/testsuite/ChangeLog: PR c++/70590 PR c++/70452 * g++.dg/pr70590.C: New test. * g++.dg/pr70590-2.C: New test. Added: trunk/gcc/testsuite/g++.dg/pr70590-2.C trunk/gcc/testsuite/g++.dg/pr70590.C Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/constexpr.c trunk/gcc/testsuite/ChangeLog
[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452 --- Comment #14 from Patrick Palka --- trunk is now at Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 1287 kB ( 1%) ggc phase parsing : 1.28 (100%) usr 0.20 (100%) sys 1.48 (99%) wall 156462 kB (99%) ggc |name lookup: 0.03 ( 2%) usr 0.01 ( 5%) sys 0.04 ( 3%) wall 81 kB ( 0%) ggc |overload resolution: 0.04 ( 3%) usr 0.00 ( 0%) sys 0.05 ( 3%) wall 628 kB ( 0%) ggc preprocessing : 0.03 ( 2%) usr 0.04 (20%) sys 0.09 ( 6%) wall 2048 kB ( 1%) ggc parser (global) : 0.02 ( 2%) usr 0.07 (35%) sys 0.06 ( 4%) wall 7735 kB ( 5%) ggc parser function body: 1.23 (96%) usr 0.09 (45%) sys 1.33 (89%) wall 146667 kB (93%) ggc TOTAL : 1.28 0.20 1.49 157759 kB The remaining extra memory and runtime overhead relative to 4.9 is mostly due to the constexpr_call_table which caches the result of each constexpr call that's been evaluated. This table now gets cleared during GC though which in real programs triggers frequently enough that its size won't be a problem.
[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452 --- Comment #13 from Patrick Palka --- Author: ppalka Date: Thu Apr 7 16:12:05 2016 New Revision: 234810 URL: https://gcc.gnu.org/viewcvs?rev=234810=gcc=rev Log: Avoid needless unsharing during constexpr evaluation (PR c++/70452) gcc/cp/ChangeLog: PR c++/70452 * constexpr.c (find_constructor): New function. (unshare_constructor): New function. (cxx_eval_call_expression): Use unshare_constructor instead of unshare_expr. (find_array_ctor_elt): Likewise. (cxx_eval_vec_init_1): Likewise. (cxx_eval_store_expression): Likewise. (cxx_eval_constant_expression): Likewise. Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/constexpr.c
[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452 --- Comment #12 from Patrick Palka --- Turns out that a single line of code is responsible for the 50MB increase in memory usage relative to 4.9, and that's the call to unshare_expr in cxx_eval_call_expression: /* Associate the bindings with the remapped parms. */ tree bound = new_call.bindings; tree remapped = parms; while (bound) { tree oparm = TREE_PURPOSE (bound); tree arg = TREE_VALUE (bound); gcc_assert (DECL_NAME (remapped) == DECL_NAME (oparm)); /* Don't share a CONSTRUCTOR that might be changed. */ arg = unshare_expr (arg); // <--- ctx->values->put (remapped, arg); bound = TREE_CHAIN (bound); remapped = DECL_CHAIN (remapped); } We unshare each function argument unconditionally even though it's only necessary to do so for CONSTRUCTORs. This apparently creates a lot of garbage. I wonder if guarding the unsharing with "TREE_CODE (arg) == CONSTRUCTOR" would be sufficient?
[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452 --- Comment #11 from Patrick Palka --- gcc 5 shows: Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 1311 kB ( 0%) ggc phase parsing : 2.10 (100%) usr 0.28 (100%) sys 2.37 (100%) wall 516276 kB (100%) ggc |name lookup: 0.03 ( 1%) usr 0.00 ( 0%) sys 0.02 ( 1%) wall 83 kB ( 0%) ggc |overload resolution: 0.02 ( 1%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall 628 kB ( 0%) ggc preprocessing : 0.01 ( 0%) usr 0.06 (21%) sys 0.06 ( 3%) wall 2048 kB ( 0%) ggc parser (global) : 0.04 ( 2%) usr 0.04 (14%) sys 0.09 ( 4%) wall 16180 kB ( 3%) ggc parser function body: 2.05 (98%) usr 0.18 (64%) sys 2.22 (93%) wall 498040 kB (96%) ggc TOTAL : 2.10 0.28 2.38 517605 kB trunk now shows: Execution times (seconds) phase setup : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 1%) wall 1287 kB ( 1%) ggc phase parsing : 1.31 (100%) usr 0.21 (100%) sys 1.51 (99%) wall 203649 kB (99%) ggc |name lookup: 0.02 ( 2%) usr 0.00 ( 0%) sys 0.03 ( 2%) wall 81 kB ( 0%) ggc |overload resolution: 0.04 ( 3%) usr 0.00 ( 0%) sys 0.04 ( 3%) wall 628 kB ( 0%) ggc preprocessing : 0.02 ( 2%) usr 0.08 (38%) sys 0.08 ( 5%) wall 2048 kB ( 1%) ggc parser (global) : 0.02 ( 2%) usr 0.02 (10%) sys 0.06 ( 4%) wall 7735 kB ( 4%) ggc parser function body: 1.27 (97%) usr 0.11 (52%) sys 1.37 (90%) wall 193854 kB (95%) ggc TOTAL : 1.31 0.21 1.52 204947 kB Hmm, peak memory usage is still about 50MB more than 4.9...
[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452 --- Comment #10 from Patrick Palka --- Author: ppalka Date: Tue Apr 5 16:40:00 2016 New Revision: 234753 URL: https://gcc.gnu.org/viewcvs?rev=234753=gcc=rev Log: Fix PR c++/70452 (regression in C++ parsing performance) gcc/cp/ChangeLog: PR c++/70452 * constexpr.c (struct fundef_copy): New struct. (struct fundef_copies_table_t): New struct. (fundef_copies_table): New static variable. (maybe_initialize_fundef_copies_table): New static function. (get_fundef_copy): New static function. (save_fundef_copy): New static function. (cxx_eval_call_expression): Use get_fundef_copy, and save_fundef_copy. (constexpr_call_table): Add "deletable" GTY marker. gcc/testsuite/ChangeLog: PR c++/70452 * g++.dg/ext/constexpr-vla4.C: New test. Added: trunk/gcc/testsuite/g++.dg/ext/constexpr-vla4.C Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/constexpr.c trunk/gcc/testsuite/ChangeLog
[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452 --- Comment #9 from Patrick Palka --- Author: ppalka Date: Tue Apr 5 01:20:00 2016 New Revision: 234732 URL: https://gcc.gnu.org/viewcvs?rev=234732=gcc=rev Log: Remove class cache_map and use ggc hash_maps instead (PR c++/70452) gcc/cp/ChangeLog: PR c++/70452 * cp-tree.h (class cache_map): Remove. * constexpr.c (cv_cache): Change type to GTY((deletable)) hash_map*. (maybe_constant_value): Adjust following the change to cv_cache. (clear_cv_cache): New static function. (clear_cv_and_fold_caches): Use it. * cp-gimplify.c (fold_cache): Change type to GTY((deletable)) hash_map *. (clear_fold_cache): Adjust following the change to fold_cache. (cp_fold): Likewise. Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/constexpr.c trunk/gcc/cp/cp-gimplify.c trunk/gcc/cp/cp-tree.h
[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452 --- Comment #8 from Patrick Palka --- So with the patch instead of making ~200k total copies of the same fn (one for each recursive call) we only make ~15 total copies (the maximum recursion depth of the function).
[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452 --- Comment #7 from Patrick Palka --- Created attachment 38155 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38155=edit patch that reuses the function copies I attached a small patch (not commented yet) that reduces the runtime (of a checking compiler) from 3.5s to 2s and reduces GGC memory usage from 550MB to 200MB when compiling the test case in comment #1. What it does is it maintains a per-FUNCTION_DECL freelist of body/parm/res copies that were created by copy_fn(). When a constexpr call is finished it pushes the copied body/parm/res trees to the freelist and before a call is evaluated it tries to reuse the trees from the freelist, falling back to copy_fn() if the freelist is empty. AFAICT the reason we use copy_fn() in the first place is to make recursive constexpr calls work. If we didn't copy the function trees then recursive calls would refer the same VAR/PARM_DECL trees. In that respect I think this patch is safe because recursive calls to the same function will still use distinct trees since all the entries on the freelist are distinct copies. Does this seem approach sensible?
[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452 Patrick Palka changed: What|Removed |Added CC||ppalka at gcc dot gnu.org --- Comment #6 from Patrick Palka --- I wonder if the constexpr call context can be re-used when the call is a recursive tail call.
[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452 --- Comment #5 from Jakub Jelinek --- Indeed, started with r217664.
[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #4 from Jakub Jelinek --- It would surprise me if the TREE_BLOCKS mattered much, but I don't see why anything would need those. In any case, this probably started with the C++14 constexpr support, maybe it would be better to copy less by copying only when you change something.
[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452 Richard Biener changed: What|Removed |Added CC||jason at gcc dot gnu.org --- Comment #3 from Richard Biener --- Jason, do the constexpr bodies need the BLOCK tree at all or can we simply re-map them all to NULL?
[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2016-03-30 Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- Confirmed. For trunk I get Samples: 13K of event 'cycles', Event count (approx.): 11326375876 4.71% cc1plus cc1plus[.] copy_tree_body_r(tree_node**, int*, voi 4.49% cc1plus cc1plus[.] cxx_eval_constant_expression(constexpr_ 4.20% cc1plus cc1plus[.] ggc_internal_alloc(unsigned long, void 2.94% cc1plus cc1plus[.] remap_type(tree_node*, copy_body_data*) 2.69% cc1plus cc1plus[.] ggc_set_mark(void const*) 2.67% cc1plus cc1plus[.] gt_ggc_mx_lang_tree_node(void*) 2.41% cc1plus cc1plus[.] walk_tree_1(tree_node**, tree_node* (*) 2.29% cc1plus cc1plus[.] cxx_eval_call_expression(constexpr_ctx 2.11% cc1plus cc1plus[.] htab_find_slot_with_hash nothing "obvious" here other than that we're actually copying a lot of stuff just to evaluate it.
[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452 --- Comment #2 from Richard Biener --- Hashtable lookups are from location/block remapping done. I _think_ that if all the copy_body stuff we do for constexpr evaluation would just "drop" locations for the copy we'd still be fine constexpr wise but would save quite a lot of linemap overhead and compile-time (hash-table lookup for combined locs via set_block for example). So maybe add a new field to copy_body_data for this purpose.
[Bug c++/70452] [5/6 Regression] Regression in C++ parsing performance between 4.9.3 and 5.3.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70452 Richard Biener changed: What|Removed |Added Keywords||compile-time-hog Target Milestone|--- |5.4 Summary|regression in C++ parsing |[5/6 Regression] Regression |performance between 4.9.3 |in C++ parsing performance |and 5.3.1 |between 4.9.3 and 5.3.1