On Tue, Mar 10, 2026 at 6:08 PM Andres Freund <[email protected]> wrote:
> You don't need a queuing workload to occasionally have a few pages of dead
> tuples at one end of the value range. E.g. a small batch insert that rolls
> back due to a unique conflict is enough. The insert case is also the one
> where, without get_actual_variable_range(), we will often end up with
> completely bogus estimates, as obviously the stored stats won't yet know about
> newly inserted data.

True, but why would you expect the problem to be any worse with the
approach that I've proposed?

If we conservatively assume that each leaf page has 6 distinct heap
pages, as is the case with pgbench_accounts_pkey (which is very low
and thus favors your argument), we can expect to examine about 2.5 x 6
= 15 heap pages before giving up, with the approach I'm proposing.
That's not all that different to the current behavior.

Besides all this, the scenario you're describing is unlikely to last
long. Clients encountering the problem you describe tend to repeat the
transactions that initially failed. All it takes is another
transaction that inserts another tuple into the rightmost leaf page,
or one close by. This will immediately make it possible for
get_actual_variable_range to find a valid extremal value in the index
once more.

> > get_actual_variable_range exists to ascertain information about a
> > particular index. To me, it makes perfect sense that a mechanism such
> > as this would work in terms of costs paid on the index AM side. (The
> > heap fetches matter a great deal too, of course, but it's reasonable
> > to use index costs as a proxy for heap/table AM costs -- but it's not
> > reasonable to use table/heap AM costs as a proxy for index AM costs.
> > It doesn't work both ways.)
>
> I am not convinced it's true either direction - tids from three leaf pages of
> tids can point to a very small number of heap pages or hundreds of heap
> pages.

I was unclear about this point.

There is definitely a wide natural variation in how many distinct heap
blocks appear on each leaf page. And the approach I have in mind is
almost completely indifferent to that. Though, of course, it still
imposes a natural ceiling on how many heap pages can be read: there
can only be so many distinct heap blocks on one leaf page (that was
what I referred to above).

I believe that it's actually a good thing that my approach doesn't
*directly* care about the number of heap accesses. Ignoring the
natural variation in heap layout will produce more consistent behavior
over time, and make it less likely that we'll give up too early. This
is helped by the fact that we can set LP_DEAD bits on dead tuples in
passing (and by the natural ceiling).

> Why is the index AM cost a good proxy for the table? If anything it's
> more sane the other way round (i.e. how it is today).

At a high level we need to reason about the general conditions in the
index without ever doing excessive work. The boundaries between leaf
pages and leaf page density provide relevant hints about the overall
picture in the index.

If we see a leaf page with one 16 byte index tuple, it is very
reasonable to surmise that it once held hundreds. To a lesser degree
it is valid to expect that its sibling leaf pages will also have very
few index tuples, likely due to a sparse deletion pattern. Indexes are
organized logically/form a key space, so making these kinds of
inferences is much more reasonable than it would be when looking at a
group of physically contiguous heap pages.

By the same token, if we encounter a small run of logically contiguous
leaf pages that are all completely empty, it's a pathological case;
the chances of there being many more empty leaf pages after that shoot
up. The more empty leaf pages we see, the more we should expect to
see. As I keep pointing out, all we need to do is find a single index
tuple -- which shouldn't be hard at all, barring these truly
patholgical cases.

I'm trying to be practical here. I want to design a solution
compatible with the changes that we're making to the table AM
interface. That fixes Mark's complaint. Ideally, this should avoid
adding much new code to hot code paths. Any design that meets those
goals is acceptable to me.

--
Peter Geoghegan


Reply via email to