On Mon, May 2, 2022 at 7:13 PM David Rowley <dgrowle...@gmail.com> wrote:

> On Tue, 3 May 2022 at 11:02, Tom Lane <t...@sss.pgh.pa.us> wrote:
> >
> > David Rowley <dgrowle...@gmail.com> writes:
> > > On Mon, 2 May 2022 at 21:00, Pavel Stehule <pavel.steh...@gmail.com>
> wrote:
> > >> I found a query that is significantly slower with more memory
> >
> > > If it was work_mem you increased, it seems strange that the plan would
> > > switch over to using a Nested Loop / Memoize plan.
> >
> > Yeah, there's something unexplained there.
> >
> > I think that the most probable explanation for the symptoms is that
> > cost_memoize_rescan is computing some insane value for est_entries,
> > causing ExecInitMemoize to allocate-and-zero a huge hash table,
> > which ExecEndMemoize then frees again.  Neither of those steps
> > gets counted into any plan node's runtime, but EXPLAIN's total
> > execution time will include them.  An insane value for est_entries
> > could perhaps go along with a cost misestimate that convinces the
> > planner to include the memoize even though it seems pointless.
>
> That seems pretty unlikely to me. est_entries is based on the minimum
> value of the expected number of total cache entries and the ndistinct
> value. ndistinct cannot be insane here as ndistinct is never going to
> be higher than the number of calls, which is the row estimate from the
> outer side of the join. That's 91 in both cases here.   As far as I
> can see, that's just going to make a table of 128 buckets.
>

If est_entries goes to zero due to hash_mem_bytes/est_entry_bytes < 1
(hence floor takes it to zero) the executor will use a size value of 1024
instead in build_hash_table.

That seems unlikely but there is no data to support or refute it.


> I'm open to making improvements to the comments in that area. I do
> remember spending quite a bit of time trying to make things as clear
> as possible as it is fairly complex what's going on there.
>
>
A few more intermediate calculation variables, along with descriptions,
would help.

e.g., min(est_cache_entries, ndistinct) is repeated twice after its initial
definition.

retention_ratio per my other reply

The (ndistinct/calls) part of hit_ratio being described specifically.

David J.

Reply via email to