Re: Problems with get_actual_variable_range's VISITED_PAGES_LIMIT

Andres Freund Tue, 10 Mar 2026 15:08:56 -0700

Hi,

On 2026-03-10 17:03:14 -0400, Peter Geoghegan wrote:
> On Tue, Feb 10, 2026 at 4:26 AM Jakub Wartak
> <[email protected]> wrote:
> > I just worry that if get_actual_variable_range() starts reporting much worse
> > estimates what do we do then? Well maybe, we'll have pg_plan_advice then,
> > or maybe the depth (time?) of search should be somehow bound to the column's
> > statistics_target too, but on second thought it looks like it should be more
> > property of the query itself (so maybe some GUC?)
> 
> The purpose of get_actual_variable_range is to get the extremal value
> in the index -- a single value from a single tuple. If we can't manage
> that by reading only 2 or 3 pages, then I see no good reason to
> believe we can by reading many more pages.
> 
> In other words, we're perfectly justified in making a soft expectation
> that it'll require very little work to get an extremal value. But once
> we notice that our soft assumption doesn't hold, all bets are off --
> we're then justified in giving up relatively quickly. Having several
> contiguous pages that are completely empty (or empty of
> non-LP_DEAD-marked tuples) at the extreme leftmost or rightmost en of
> the index is absolutely a pathological case. It strongly suggests a
> queue-like workload of the kind that my testing shows that
> VISITED_PAGES_LIMIT can't deal with sensibly.


You don't need a queuing workload to occasionally have a few pages of dead
tuples at one end of the value range. E.g. a small batch insert that rolls
back due to a unique conflict is enough. The insert case is also the one
where, without get_actual_variable_range(), we will often end up with
completely bogus estimates, as obviously the stored stats won't yet know about
newly inserted data.


> get_actual_variable_range exists to ascertain information about a
> particular index. To me, it makes perfect sense that a mechanism such
> as this would work in terms of costs paid on the index AM side. (The
> heap fetches matter a great deal too, of course, but it's reasonable
> to use index costs as a proxy for heap/table AM costs -- but it's not
> reasonable to use table/heap AM costs as a proxy for index AM costs.
> It doesn't work both ways.)

I am not convinced it's true either direction - tids from three leaf pages of
tids can point to a very small number of heap pages or hundreds of heap
pages. Why is the index AM cost a good proxy for the table? If anything it's
more sane the other way round (i.e. how it is today).

Greetings,

Andres Freund

Re: Problems with get_actual_variable_range's VISITED_PAGES_LIMIT

Reply via email to