Greetings, * Peter Geoghegan (p...@bowt.ie) wrote: > On Wed, Jun 16, 2021 at 12:06 PM Andres Freund <and...@anarazel.de> wrote: > > > I would think that it wouldn't really matter inside VACUUM -- it would > > > only really need to be either an opportunistic pruning or an > > > opportunistic index deletion thing -- probably both. Most of the time > > > VACUUM doesn't seem to end up doing most of the work of removing > > > garbage versions. It's mostly useful for "floating garbage", to use > > > the proper GC memory management term. > > > > I don't fully agree with this. For one, there are workloads where VACUUM > > removes the bulk of the dead tuples. > > It's definitely much more important that VACUUM run often when non-HOT > updates are the norm, and there are lots of them. But that's probably > not going to be helped all that much by this technique anyway.
I don't follow this argument. Surely there are many, many cases out there where there's very few HOT updates but lots of non-HOT updates which create lots of dead rows that can't currently be cleaned up if there's a long running transaction hanging around. > Mostly I'm just saying I'd start elsewhere and do heapam later. And > probably do VACUUM itself last of all, if that usefully cut scope. Not quite following what 'elsewhere' means here or what it would entail if it involves cleaning up dead tuples but doesn't involve heapam. I can sort of follow the idea of working on the routine page-level cleanup of tuples rather than VACUUM, except that would seem to require one to deal with the complexities of ctid chains discussed below and therefore be a larger and more complicated effort than if one were to tackle VACUUM and perhaps in the first round cut scope by explicitly ignoring ctid chains. > > For another, slowing down VACUUM > > can cause a slew of follow-on problems, so being careful to not > > introduce new bottlenecks is important. And I don't think just doing > > this optimization as part of on-access pruning is reasonable > > solution. And it's not like making on-access pruning slower is > > unproblematic either. I don't know that slowing down VACUUM, which already goes purposefully slow by default when run out of autovacuum, needs to really be stressed over, particularly when what we're talking about here are CPU cycles. I do think it'd make sense to have a heuristic which decides if we're going to put in the effort to try to do this kind of pruning. That is- if the global Xmin and the current transaction are only a few thousand apart or something along those lines then don't bother, but if there's been 100s of thousands of transactions then enable it (perhaps allowing control over this or allowing users to explicitly ask VACUUM to 'work harder' or such). > I think that on-access pruning is much more important because it's the > only hope we have of keeping the original heap page intact, in the > sense that there are no non-HOT updates over time, though there may be > many HOT updates. And no LP_DEAD items ever accumulate. It's not so > much about cleaning up bloat as it is about *preserving* the heap > pages in this sense. > > If in the long run it's impossible to keep the page intact in this > sense then we will still have most of our current problems. It might > not make that much practical difference if we simply delay the problem > -- we kinda have to prevent it entirely, at least for a given > workload. So I'm mostly concerned about keeping things stable over > time, at the level of individual pages. I do think that's a worthwhile goal, but if we could get some kind of cleanup happening, that strikes me as better than the nothing that we have today. Which side makes sense to tackle first is certainly a discussion that could be had but I'd go for "do the simple thing first". > > But as I said nearby, I think the hardest part is figuring out how to > > deal with ctid chains, not the efficiency of the xid->visibility lookup > > (or the collection of data necessary for that lookup). > > Definitely true. It strikes me that stressing over ctid chains, while certainly something to consider, at this point is putting the cart before the horse in this discussion- there's not much sense in it if we haven't actually got the data collection piece figured out and working (and hopefully in a manner that minimizes the overhead from it) and then worked out the logic to figure out if a given tuple is actually visible to any running transaction. As I say above, it seems like it'd be a great win even if it was initially only able to deal with 'routine'/non-chained cases and only with VACUUM. The kind of queue tables that I'm thinking of, at least, are ones like what PgQ uses: https://github.com/pgq/pgq Now, that already works around our lacking here by using TRUNCATE and table rotation, but if we improved here then it'd potentially be able to be rewritten to use routine DELETE's instead of TRUNCATE. Even the UPDATEs which are done to process a batch for a subscriber look to be non-HOT due to updating indexed fields anyway (in pgq.next_batch_custom(), it's setting subscription.sub_batch which has a UNIQUE btree on it). Looks like there's a HOT UPDATE for the queue table when a table swap happens, but that UPDATE wouldn't actually be necessary if we'd fix the issue with just routine INSERT/DELETE leading to tons of dead tuples that can't be VACUUM'd if a long running transaction is running, and I doubt that UPDATE was actually intentionally designed to take advantage of HOT, it just happened to work that way. The gist of what I'm trying to get at here is that the use-cases I've seen, and where people have put in the effort to work around the long running transaction vs. VACUUM issue by using hacks like table swapping and TRUNCATE, aren't cases where there's a lot of HOT updating happening on the tables that are getting bloated due to VACUUM being unable to clean up tuples. So, if that's actually the easier thing to tackle, fantastic, let's do it and then figure out how to improve on it to handle the more complicated cases later. (This presumes that it's actually possible to essentially 'skip' the hard cases and still have a working implementation, of course). Thanks, Stephen
signature.asc
Description: PGP signature