On Mon, May 2, 2022 at 7:13 PM David Rowley <dgrowle...@gmail.com> wrote:
> On Tue, 3 May 2022 at 11:02, Tom Lane <t...@sss.pgh.pa.us> wrote: > > > > David Rowley <dgrowle...@gmail.com> writes: > > > On Mon, 2 May 2022 at 21:00, Pavel Stehule <pavel.steh...@gmail.com> > wrote: > > >> I found a query that is significantly slower with more memory > > > > > If it was work_mem you increased, it seems strange that the plan would > > > switch over to using a Nested Loop / Memoize plan. > > > > Yeah, there's something unexplained there. > > > > I think that the most probable explanation for the symptoms is that > > cost_memoize_rescan is computing some insane value for est_entries, > > causing ExecInitMemoize to allocate-and-zero a huge hash table, > > which ExecEndMemoize then frees again. Neither of those steps > > gets counted into any plan node's runtime, but EXPLAIN's total > > execution time will include them. An insane value for est_entries > > could perhaps go along with a cost misestimate that convinces the > > planner to include the memoize even though it seems pointless. > > That seems pretty unlikely to me. est_entries is based on the minimum > value of the expected number of total cache entries and the ndistinct > value. ndistinct cannot be insane here as ndistinct is never going to > be higher than the number of calls, which is the row estimate from the > outer side of the join. That's 91 in both cases here. As far as I > can see, that's just going to make a table of 128 buckets. > If est_entries goes to zero due to hash_mem_bytes/est_entry_bytes < 1 (hence floor takes it to zero) the executor will use a size value of 1024 instead in build_hash_table. That seems unlikely but there is no data to support or refute it. > I'm open to making improvements to the comments in that area. I do > remember spending quite a bit of time trying to make things as clear > as possible as it is fairly complex what's going on there. > > A few more intermediate calculation variables, along with descriptions, would help. e.g., min(est_cache_entries, ndistinct) is repeated twice after its initial definition. retention_ratio per my other reply The (ndistinct/calls) part of hit_ratio being described specifically. David J.