On 6/8/26 15:32, Andrei Lepikhov wrote: > On 07/06/2026 20:10, Tomas Vondra wrote: >> On 6/5/26 12:43, Tomas Vondra wrote: >> After classifying the allocations a bit, I see this: >> >> allocation | size >> --------------------------------------+---------------- >> find_mergeclauses_for_outer_pathkeys | 671 MB >> generate_join_implied_equalities | 660 MB >> add_paths_to_joinrel | 242 MB >> make_inner_pathkeys_for_merge | 195 MB >> estimate_num_groups | 92 MB >> get_joinrel_parampathinfo | 90 MB >> create_memoize_path | 48 MB >> | 15 MB >> calc_nestloop_required_outer | 5226 kB >> (9 rows) > > It is fun to use the benchmark script from the previous research on this topic > [1], conducted in 2009. Query, consuming a lot of memory and planning time > [2], > now shows the following numbers: > > join_collapse_limit = 12: Memory: used=0.8GB, Time: 2.5s > join_collapse_limit = 14: Memory: used=1.1GB, Time: 5.5s > join_collapse_limit = 18: Memory: used=9.3GB, Time: 50.6s > > It's interesting to compare these results with your memory consumption > profile. > I tried using the memtrace patch and the Python script directly, but they gave > unusual output on my machine, so I couldn't rely on them without more detailed > instructions. >
Yeah, the memtrace patch is very hacky, and I haven't given enough instructions how to use it. The .patch adds elog(LOG) with the MEMTRACE information (context, action, chunk, size), and the memtrace.py aggregates it like this: grep MEMTRACE pg.log | ./memtrace.py > memtrace.log But the heaptrack seems more convenient, of course. > Instead, I used the heaptrack tool, which gave me the following profiles: > bytes pct function > collapse limit = 14: > 354.48 MB 29.3% get_relation_foreign_keys > 109.07 MB 9.0% generate_join_implied_equalities_normal > 75.51 MB 6.2% get_eclass_indexes_for_relids > 67.12 MB 5.5% find_mergeclauses_for_outer_pathkeys > 67.12 MB 5.5% create_nestloop_path > 58.73 MB 4.9% make_inner_pathkeys_for_merge > 38.27 MB 3.2% expression_tree_mutator_impl > 33.56 MB 2.8% hash_inner_and_outer > collapse limit = 18: > 1.23 GB 12.8% generate_join_implied_equalities_normal > 1.11 GB 11.5% get_joinrel_parampathinfo > 989.90 MB 10.3% get_eclass_indexes_for_relids > 755.04 MB 7.9% find_mergeclauses_for_outer_pathkeys > 721.44 MB 7.5% calc_nestloop_required_outer > 721.43 MB 7.5% get_param_path_clause_serials > 629.15 MB 6.5% bms_intersect > 578.81 MB 6.0% have_unsafe_outer_join_ref > 411.08 MB 4.3% make_inner_pathkeys_for_merge > 354.48 MB 3.7% get_relation_foreign_keys > 335.58 MB 3.5% create_nestloop_path > 301.99 MB 3.1% generate_join_implied_equalities > Hmm, how does heaptrack deal with out memory pools? I was worried existing memory profilers (like heaptrack) would get confused by our memory contexts, attributing the whole block to the palloc that just happens to allocate a new block. But that's not really right. I see the heaptrack README claims it can work with memory pools after annotating the code in some way. But there's not much details about that. Also, it suggests valgrind/massif can already do that. > Overall, the results for generate_join_implied_equalities and > find_mergeclauses_for_outer_pathkeys are consistent. This test also highlights > other sources of memory allocations, such as parameterised paths. The memory > profile changes as the number of joins in the 'join problem' increases. > Yes. I did actually mention get_joinrel_parampathinfo. regards -- Tomas Vondra
