On Fri, May 7, 2021 at 9:16 AM Justin Pryzby <pry...@telsasoft.com> wrote: > In pg13, indexes are de-duplicated by default. > > But I suspect the performance is better because data was reload, and the > smaller indexes are a small, additional benefit.
That's a very reasonable interpretation, since the bitmap index scans themselves just aren't doing that much I/O -- we see that there is much more I/O for the heap scan, which is likely to be what the general picture looks like no matter how much bloat there is. However, I'm not sure if that reasonable interpretation is actually correct. The nbtinsert.c code that handles deleting LP_DEAD index tuples no longer relies on having a page-level garbage item flag set in Postgres 13 -- it just scans the line pointer array for LP_DEAD items each time. VACUUM has a rather unhelpful tendency to unset the flag when it shouldn't, which we're no longer affected by. So that's one possible explanation. Another possible explanation is that smaller indexes (due to deduplication) are more likely to get index scans, which leads to setting the LP_DEAD bit of known-dead index tuples in passing more often (bitmap index scans won't do the kill_prior_tuple optimization). There could even be a virtuous circle over time. (Note that the index deletion stuff in Postgres 14 pretty much makes sure that this happens, but it is probably at least much more likely in Postgres 13 compared to 12.) I could easily be very wrong about all of this in this instance, though, because the behavior I've described is highly non-linear and therefore highly unpredictable in general (not to mention highly sensitive to workload characteristics). I'm sure that I've thought about this stuff way more than any other individual Postgres contributor, but it's easy to be wrong in any given instance. The real explanation might be something else entirely. Though it's hard not to imagine that what really matters here is avoiding all of that bitmap heap scan I/O. -- Peter Geoghegan