On 6/8/26 15:32, Andrei Lepikhov wrote:
> On 07/06/2026 20:10, Tomas Vondra wrote:
>> On 6/5/26 12:43, Tomas Vondra wrote:
>> After classifying the allocations a bit, I see this:
>>
>>            allocation                 | size
>> --------------------------------------+----------------
>>  find_mergeclauses_for_outer_pathkeys | 671 MB
>>  generate_join_implied_equalities     | 660 MB
>>  add_paths_to_joinrel                 | 242 MB
>>  make_inner_pathkeys_for_merge        | 195 MB
>>  estimate_num_groups                  | 92 MB
>>  get_joinrel_parampathinfo            | 90 MB
>>  create_memoize_path                  | 48 MB
>>                                       | 15 MB
>>  calc_nestloop_required_outer         | 5226 kB
>> (9 rows)
> 
> It is fun to use the benchmark script from the previous research on this topic
> [1], conducted in 2009. Query, consuming a lot of memory and planning time 
> [2],
> now shows the following numbers:
> 
> join_collapse_limit = 12: Memory: used=0.8GB, Time: 2.5s
> join_collapse_limit = 14: Memory: used=1.1GB, Time: 5.5s
> join_collapse_limit = 18: Memory: used=9.3GB, Time: 50.6s
> 
> It's interesting to compare these results with your memory consumption 
> profile.
> I tried using the memtrace patch and the Python script directly, but they gave
> unusual output on my machine, so I couldn't rely on them without more detailed
> instructions. 
> 

Yeah, the memtrace patch is very hacky, and I haven't given enough
instructions how to use it. The .patch adds elog(LOG) with the MEMTRACE
information (context, action, chunk, size), and the memtrace.py
aggregates it like this:

grep MEMTRACE pg.log | ./memtrace.py  > memtrace.log

But the heaptrack seems more convenient, of course.

> Instead, I used the heaptrack tool, which gave me the following
profiles:
> bytes     pct  function
> collapse limit = 14:
>  354.48 MB   29.3%  get_relation_foreign_keys
>  109.07 MB    9.0%  generate_join_implied_equalities_normal
>   75.51 MB    6.2%  get_eclass_indexes_for_relids
>   67.12 MB    5.5%  find_mergeclauses_for_outer_pathkeys
>   67.12 MB    5.5%  create_nestloop_path
>   58.73 MB    4.9%  make_inner_pathkeys_for_merge
>   38.27 MB    3.2%  expression_tree_mutator_impl
>   33.56 MB    2.8%  hash_inner_and_outer
> collapse limit = 18:
>    1.23 GB   12.8%  generate_join_implied_equalities_normal
>    1.11 GB   11.5%  get_joinrel_parampathinfo
>  989.90 MB   10.3%  get_eclass_indexes_for_relids
>  755.04 MB    7.9%  find_mergeclauses_for_outer_pathkeys
>  721.44 MB    7.5%  calc_nestloop_required_outer
>  721.43 MB    7.5%  get_param_path_clause_serials
>  629.15 MB    6.5%  bms_intersect
>  578.81 MB    6.0%  have_unsafe_outer_join_ref
>  411.08 MB    4.3%  make_inner_pathkeys_for_merge
>  354.48 MB    3.7%  get_relation_foreign_keys
>  335.58 MB    3.5%  create_nestloop_path
>  301.99 MB    3.1%  generate_join_implied_equalities
> 

Hmm, how does heaptrack deal with out memory pools? I was worried
existing memory profilers (like heaptrack) would get confused by our
memory contexts, attributing the whole block to the palloc that just
happens to allocate a new block. But that's not really right.

I see the heaptrack README claims it can work with memory pools after
annotating the code in some way. But there's not much details about
that. Also, it suggests valgrind/massif can already do that.

> Overall, the results for generate_join_implied_equalities and
> find_mergeclauses_for_outer_pathkeys are consistent. This test also highlights
> other sources of memory allocations, such as parameterised paths. The memory
> profile changes as the number of joins in the 'join problem' increases.
> 

Yes. I did actually mention get_joinrel_parampathinfo.


regards

-- 
Tomas Vondra



Reply via email to