On 04/07/2014 01:49 PM, Richard Biener wrote:
On Mon, Apr 7, 2014 at 1:25 PM, Martin Liška <mli...@suse.cz> wrote:On 04/03/2014 10:40 PM, Jan Hubicka wrote:Firefox: cgraph.c:869 (cgraph_create_edge_1) 0: 0.0% 0: 0.0% 130358176: 6.9% 0: 0.0% 1253444 cgraph.c:510 (cgraph_allocate_node) 0: 0.0% 0: 0.0% 182236800: 9.7% 0: 0.0% 555600 toplev.c:960 (realloc_for_line_map) 0: 0.0% 89503888: 5.5% 268468240:14.3% 160: 0.0% 13 tree-streamer-in.c:621 (streamer_alloc_tree) 93089976:77.5% 972848816:59.6% 639230248:33.9% 21332480:32.3% 13496198 Total 120076578 1632997043 1883064062 65981723 24732501 source location Garbage Freed Leak Overhead TimesHi, this is the linemap compression patch. For me it reduces realloc_for_line_map to about 60MB, 4% toplev.c:960 (realloc_for_line_map) 0: 0.0% 22395000: 1.5% 67141656: 4.1% 144: 0.0% 12 vec.h:626 ((null)) 0: 0.0% 134568360: 9.3% 75497528: 4.6% 1569368: 2.4% 1009306 vec.h:614 ((null)) 10659408: 8.3% 61265504: 4.2% 78711104: 4.8% 793720: 1.2% 632110 vec.h:698 ((null)) 908768: 0.7% 99564524: 6.9% 82255800: 5.0% 4201148: 6.5% 874628 vec.h:666 ((null)) 12840: 0.0% 73534476: 5.1% 92633604: 5.7% 2929176: 4.5% 776347 lto/lto.c:245 (lto_read_in_decl_state) 0: 0.0% 43115656: 3.0% 94137976: 5.8% 25396856:39.2% 2698570 cgraph.c:869 (cgraph_create_edge_1) 0: 0.0% 0: 0.0% 124069712: 7.6% 0: 0.0% 1192978 cgraph.c:510 (cgraph_allocate_node) 0: 0.0% 0: 0.0% 189855256:11.6% 0: 0.0% 578827 tree-streamer-in.c:621 (streamer_alloc_tree) 97891888:76.1% 892961808:61.6% 613594816:37.6% 22268544:34.3% 12574164 Total 128555402 1448954963 1633186049 64847582 24190936 source location Garbage Freed Leak Overhead TimesSo I get about 1GB of peak GGC memory and about 2.6GB for WPA in TOP. (still on the old tree) You may try to experiment with LOCATION_CACHE_SIZE. It sould be big enough so the locations get shared across different input files.Hi! Suggested patch looks fine, according to following data: https://drive.google.com/file/d/0B0pisUJ80pO1X05SMXdXS2ZScEE/edit?usp=sharing, it saves about ~1GB during non-parallel WPA phase. Average memory consumption during parallel WPA phase reduces by 1-2GB. It would be good to apply the patch.AFAIK we settled on a simpler one dropping columns at stream-out time that also helped. As for the correct way to do the optimization we agreed(?) that streaming the locations elsewhere and using references to them is more appropriate. At stream-in (or before stream-out) we can then read the location pairs and sort them before assigning linemap entries. Thus, similar to the on-the-side string stream use a location stream to stream locs. At stream-in, input it as a whole and assign linemap numbers in optimal order. Column dropping hack: Index: gcc/lto-streamer-out.c =================================================================== --- gcc/lto-streamer-out.c (revision 209181) +++ gcc/lto-streamer-out.c (working copy) @@ -176,6 +176,7 @@ lto_output_location (struct output_block return; xloc = expand_location (loc); + xloc.column = 0; bp_pack_value (bp, ob->current_file != xloc.file, 1); bp_pack_value (bp, ob->current_line != xloc.line, 1); maybe you can check the effect of that (we'd then set flag_diagnostics_show_caret to 0 for lto1 invocations and thus middle-end diagnostics). Richard.
I added new graph for 'xloc.column = 0' hack, just applied this single patch to trunk. Link: https://drive.google.com/file/d/0B0pisUJ80pO1MW11WHdjMk9KQnc/edit?usp=sharing
Martin
MartinHonza Index: lto-streamer-in.c =================================================================== --- lto-streamer-in.c (revision 209047) +++ lto-streamer-in.c (working copy) @@ -145,21 +145,49 @@ canon_file_name (const char *string) } +/* location_cache is used at LTO read in to avoid too many duplicates in + the linemap tables. */ + +#define LOCATION_CACHE_SIZE 524287 +struct location_cache_entry +{ + const char *file; + int line; + int col; + location_t location; +}; +static struct location_cache_entry *location_cache; + +/* Return hash of FILE/LINE/COL. */ + +int +location_cache_hash (const char *file, int line, int col) +{ + return iterative_hash_hashval_t ((size_t)file, + iterative_hash_hashval_t (line, col)) % LOCATION_CACHE_SIZE; +} + + /* Read a location bitpack from input block IB. */ location_t lto_input_location (struct bitpack_d *bp, struct data_in *data_in) { - static const char *current_file; - static int current_line; + static const char *current_file, *last_file; + static int current_line, last_line; static int current_col; bool file_change, line_change, column_change; unsigned len; - bool prev_file = current_file != NULL; + bool prev_file = last_file != NULL; + int hash; + const char *cfile; if (bp_unpack_value (bp, 1)) return UNKNOWN_LOCATION; + if (!location_cache) + location_cache = XCNEWVEC (struct location_cache_entry, LOCATION_CACHE_SIZE); + file_change = bp_unpack_value (bp, 1); line_change = bp_unpack_value (bp, 1); column_change = bp_unpack_value (bp, 1); @@ -175,18 +203,32 @@ lto_input_location (struct bitpack_d *bp if (column_change) current_col = bp_unpack_var_len_unsigned (bp); + cfile = current_file; + hash = location_cache_hash (cfile, current_line, current_col); - if (file_change) + if (location_cache[hash].file == cfile + && location_cache[hash].line == current_line + && location_cache[hash].col == current_col + 1) + return location_cache[hash].location; + location_cache[hash].file = cfile; + location_cache[hash].line = current_line; + location_cache[hash].col = current_col + 1; + + if (current_file != last_file) { if (prev_file) linemap_add (line_table, LC_LEAVE, false, NULL, 0); linemap_add (line_table, LC_ENTER, false, current_file, current_line); } - else if (line_change) + else if (current_line != last_line) linemap_line_start (line_table, current_line, current_col); - return linemap_position_for_column (line_table, current_col); + location_cache[hash].location + = linemap_position_for_column (line_table, current_col); + last_file = current_file; + last_line = current_line; + return location_cache[hash].location; } @@ -981,6 +1023,27 @@ input_function (tree fn_decl, struct dat } bsi = gsi_start_bb (bb); while (!gsi_end_p (bsi)) + { + gimple stmt = gsi_stmt (bsi); + /* If we're recompiling LTO objects with debug stmts but + we're not supposed to have debug stmts, remove them now. + We can't remove them earlier because this would cause uid + mismatches in fixups, but we can do it at this point, as + long as debug stmts don't require fixups. */ + if (!MAY_HAVE_DEBUG_STMTS && is_gimple_debug (stmt)) + { + gimple_stmt_iterator gsi = bsi; + gsi_next (&bsi); + gsi_remove (&gsi, true); + } + else + { + gsi_next (&bsi); + stmts[gimple_uid (stmt)] = stmt; + } + } + bsi = gsi_start_bb (bb); + while (!gsi_end_p (bsi)) { gimple stmt = gsi_stmt (bsi); /* If we're recompiling LTO objects with debug stmts but