On Fri, May 7, 2021 at 9:16 AM Justin Pryzby <pry...@telsasoft.com> wrote:
> In pg13, indexes are de-duplicated by default.
>
> But I suspect the performance is better because data was reload, and the
> smaller indexes are a small, additional benefit.

That's a very reasonable interpretation, since the bitmap index scans
themselves just aren't doing that much I/O -- we see that there is
much more I/O for the heap scan, which is likely to be what the
general picture looks like no matter how much bloat there is.

However, I'm not sure if that reasonable interpretation is actually
correct. The nbtinsert.c code that handles deleting LP_DEAD index
tuples no longer relies on having a page-level garbage item flag set
in Postgres 13 -- it just scans the line pointer array for LP_DEAD
items each time. VACUUM has a rather unhelpful tendency to unset the
flag when it shouldn't, which we're no longer affected by. So that's
one possible explanation.

Another possible explanation is that smaller indexes (due to
deduplication) are more likely to get index scans, which leads to
setting the LP_DEAD bit of known-dead index tuples in passing more
often (bitmap index scans won't do the kill_prior_tuple optimization).
There could even be a virtuous circle over time. (Note that the index
deletion stuff in Postgres 14 pretty much makes sure that this
happens, but it is probably at least much more likely in Postgres 13
compared to 12.)

I could easily be very wrong about all of this in this instance,
though, because the behavior I've described is highly non-linear and
therefore highly unpredictable in general (not to mention highly
sensitive to workload characteristics). I'm sure that I've thought
about this stuff way more than any other individual Postgres
contributor, but it's easy to be wrong in any given instance. The real
explanation might be something else entirely. Though it's hard not to
imagine that what really matters here is avoiding all of that bitmap
heap scan I/O.

-- 
Peter Geoghegan


Reply via email to