On Wed, May 1, 2013 at 1:02 PM, Simon Riggs <si...@2ndquadrant.com> wrote: > I agree, but that was in the original coding wasn't it?
I believe the problem was introduced by this commit: commit fdf9e21196a6f58c6021c967dc5776a16190f295 Author: Heikki Linnakangas <heikki.linnakan...@iki.fi> Date: Wed Feb 13 17:46:23 2013 +0200 Update visibility map in the second phase of vacuum. There's a high chance that a page becomes all-visible when the second phase of vacuum removes all the dead tuples on it, so it makes sense to check for that. Otherwise the visibility map won't get updated until the next vacuum. Pavan Deolasee, reviewed by Jeff Janes. > Why aren't we writing just one WAL record for this action? We use a > single WAL record in other places where we make changes to multiple > blocks with multiple full page writes, e.g. index block split. That > would make the action atomic and we'd just have this... > > 1. Perform the cleanup operations on the buffer. > 2. Set the visibility map bit. > 3. Log the cleanup operations and visibility map change. > > which can then be replayed with correct sequence, locking etc. > and AFAICS would likely be faster also. I thought about that, too. It certainly seems like more than we want to try to do for 9.3 at this point. The other complication is that there's a lot of conditional logic here. We're definitely going to emit a cleanup record. We're going to emit a record to make the page all-visible only sometimes, because it might not be all-visible yet: it could have tuples on it that are deleted but not yet dead. And then there's additional logic to handle the checksum case. Plus, the all-visible marking can happen in other code paths, too, specifically in phase 1 of vacuum. So it might be possible to consolidate this, but off-hand it looks messy to me out of proportion to the benefits. Now that I'm looking at this, I'm a bit confused by the new logic in visibilitymap_set(). When checksums are enabled, we set the page LSN, which is described like this: "we need to protect the heap page from being torn". But how does setting the page LSN do that? Don't we need to mark the buffer dirty or something like that? Sorry if this is a dumb question. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers