Re: A performance regression issue with Memoize

David Rowley Mon, 28 Jul 2025 21:31:28 -0700

On Tue, 29 Jul 2025 at 16:01, Richard Guo <[email protected]> wrote:
> Another possible direction is to support runtime plan correction or
> feedback loops.  We've always followed a "plan-first, execute-next"
> approach so far.  But perhaps we could extend that by monitoring plan
> execution and triggering re-optimization when the executor detects
> that actual result sizes or other runtime statistics differ
> significantly from the estimates.  In recent years, there have been
> more and more papers and research on adaptive query processing.  It
> might be worth considering how PostgreSQL could support such
> techniques in the future.

I've recently noticed that some databases are doing things like this.
[1] is an example. For the record, I've absolutely no insight into
what's going on there aside from what's mentioned in the public
documentation. In any case, I don't think that idea is new on us as
there's been discussion before about having some sort of hybrid hash
join / nested loop before in regards to trying to fix the issue with
extra large batches during hash joins.

If we were to adopt something similar, I believe we'd need to have
some sort of estimate on the certainty of the statistics we're working
with, otherwise, how would you know when and when not to use the
adaptive method? There's also the PathKey issue when switching
algorithms. For example, nested loops preserve the outer path's order,
but multi-batch hash joins do not. That may not be an issue when
switching a method that's more strict in terms of row output order,
but it could be when going the other way. That means you don't get the
full flexibility to adapt the plan as you'd get from having the
planner choose the new plan in the first place.

For the record, I 100% agree that there will always be cases where
statistics are just unable to represent what is discovered at
run-time, so having some sort of ability to adapt at run-time seems
like a natural progression on the evolutionary chain. I just don't
know if it's the best or best next step to make. I suspect we might be
skipping a few steps from what we have now if we went there in the
near future. We don't yet have extended statistics for joins yet, for
example.

David

[1]
https://learn.microsoft.com/en-us/sql/relational-databases/performance/joins?view=sql-server-ver17#adaptive

Re: A performance regression issue with Memoize

Reply via email to