On Tue, 23 Aug 2011, Jakub Jelinek wrote:
On Tue, Aug 23, 2011 at 02:40:56PM +0300, Dimitrios Apostolou wrote:
dst->vars = (shared_hash) pool_alloc (shared_hash_pool);
dst->vars->refcount = 1;
dst->vars->htab
- = htab_create (MAX (src1_elems, src2_elems), variable_htab_hash,
+ = htab_create (2 * MAX (src1_elems, src2_elems), variable_htab_hash,
variable_htab_eq, variable_htab_free);
This looks wrong, 2 * max is definitely too much.
For a hash table to fit N elements, it has to have at least 4/3*N
slots, or 2*N slots if htab has the 50% load factor I was proposing.
For var-tracking, 50% load factor is IMHO a very bad idea, memory
consumption of var-tracking is already high right now, we IMHO don't have
the luxury to waste more RAM.
Agreed, then I 'll change it to 4/3 * MAX so that I avoid expansions.
In total for dataflow_set_destroy I can see that calls to
attrs_list_clear() have been reduced from 500K to 250K, and I can
also see a reduction of free() calls from htab_delete(), from 30K to
free calls? If you avoid free calls, then you end up with a memory leak.
I can understand when you avoid pool_free calls...
You are right, I just mentioned the total difference in free() calls from
all my patches. But in this part there is no free() involved, so the major
gain should be from avoiding htab_delete() iterating many times over the
hash tables. Annotated source from the callgrind profiler (shows
instruction count):
Before:
. void
15,820,597 htab_delete (htab_t htab)
250,914 {
41,819 size_t size = htab_size (htab);
41,819 PTR *entries = htab->entries;
. int i;
.
83,638 if (htab->del_f)
66,493,884 for (i = size - 1; i >= 0; i--)
49,825,998 if (entries[i] != HTAB_EMPTY_ENTRY && entries[i] !=
HTAB_DELETED_ENTRY)
1,813,018 (*htab->del_f) (entries[i]);
1,800,731 => tree-into-ssa.c:def_blocks_free (10988x)
345,910 => tree-into-ssa.c:repl_map_free (2815x)
53,950 => tree-scalar-evolution.c:del_scev_info (1197x)
139,354 => tree-ssa-loop-im.c:memref_free (198x)
281,777 => tree-ssa-sccvn.c:free_phi (3788x)
81,726 => tree-ssa-uncprop.c:equiv_free (463x)
17,359,572 => var-tracking.c:variable_htab_free (835512x)
42,161 => cfgloop.c:loop_exit_free (720x)
284,904 => ira-costs.c:cost_classes_del (2270x)
157 => tree-ssa-loop-im.c:vtoe_free (1x)
11,454,221 => ???:free (37228x)
1,460,684 => tree-ssa-sccvn.c:free_reference (11329x)
.
125,457 if (htab->free_f != NULL)
After:
. void
6,543,474 htab_delete (htab_t htab)
250,914 {
41,819 size_t size = htab_size (htab);
41,819 PTR *entries = htab->entries;
. int i;
.
83,638 if (htab->del_f)
29,288,268 for (i = size - 1; i >= 0; i--)
21,927,330 if (entries[i] != HTAB_EMPTY_ENTRY && entries[i] !=
HTAB_DELETED_ENTRY)
1,738,584 (*htab->del_f) (entries[i]);
139,344 => tree-ssa-loop-im.c:memref_free (198x)
157 => tree-ssa-loop-im.c:vtoe_free (1x)
81,762 => tree-ssa-uncprop.c:equiv_free (463x)
281,884 => tree-ssa-sccvn.c:free_phi (3788x)
42,145 => cfgloop.c:loop_exit_free (720x)
345,870 => tree-into-ssa.c:repl_map_free (2815x)
1,462,000 => tree-ssa-sccvn.c:free_reference (11329x)
1,800,951 => tree-into-ssa.c:def_blocks_free (10988x)
16,518,004 => var-tracking.c:variable_htab_free (824010x)
284,979 => ira-costs.c:cost_classes_del (2270x)
9,080,245 => ???:free (11513x)
53,921 => tree-scalar-evolution.c:del_scev_info (1197x)
.
125,457 if (htab->free_f != NULL)
So if the part relevant to this patch is the number of calls to
variable_htab_free, I suppose it won't make a big difference.
But if I take a look at the top callers and top callees of htab_delete():
Before:
19,590,149 < tree-ssa-structalias.c:solve_constraints (2058x) [cc1]
33,299,064 < var-tracking.c:shared_hash_destroy (6923x) [cc1]
68,941,719 < tree-ssa-pre.c:execute_pre (2058x) [cc1]
134,915,334 * hashtab.c:htab_delete [cc1]
17,359,572 > var-tracking.c:variable_htab_free (835512x) [cc1]
11,454,221 > ???:free (108456x) [libc-2.12.so]
1,800,731 > tree-into-ssa.c:def_blocks_free (10988x) [cc1]
After:
8,756,328 < tree-ssa-structalias.c:delete_points_to_sets (1029x) [cc1]
9,316,165 < tree-ssa-dom.c:tree_ssa_dominator_optimize (562x) [cc1]
83,696,211 < var-tracking.c:shared_hash_destroy (6923x) [cc1]
60,459,493 * hashtab.c:htab_delete [cc1]
16,518,004 > var-tracking.c:variable_htab_free (824010x) [cc1]
9,080,245 > ???:free (82741x) [libc-2.12.so]
1,800,951 > tree-into-ssa.c:def_blocks_free (10988x) [cc1]
Then I'm more confused, htab_delete() seems to be much colder, probably
due to less traversals of hash tables, but It's not clear how much
var-tracking changes are responsible for this. I'll keep in mind to revert
the do_free parameter and report back. I guess for less than 10 M instr
it's not worth the hassle.
Thanks,
Dimitris