On 17.05.2013 12:35, Andres Freund wrote:
On 2013-05-17 10:45:26 +0300, Heikki Linnakangas wrote:
On 16.05.2013 04:15, Andres Freund wrote:
Couldn't we "just" take the extension lock and then walk backwards from
the rechecked end of relation ConditionalLockBufferForCleanup() the
buffers?
For every such locked page we check whether its still empty. If we find
a page that we couldn't lock, isn't empty or we already locked a
sufficient number of pages we truncate.

You need an AccessExclusiveLock on the relation to make sure that after you
have checked that pages 10-15 are empty, and truncated them away, a backend
doesn't come along a few seconds later and try to read page 10 again. There
might be an old sequential scan in progress, for example, that thinks that
the pages are still there.

But that seems easily enough handled: We know the current page in its
scan cannot be removed since its pinned. So make
heapgettup()/heapgetpage() pass something like RBM_IFEXISTS to
ReadBuffer and if the read fails recheck the length of the relation
before throwing an error.

Hmm. For the above to work, you'd need to atomically check that the pages you're truncating away are not pinned, and truncate them. If those steps are not atomic, a backend might pin a page after you've checked that it's not pinned, but before you've truncated the underlying file. I guess that be doable; needs some new infrastructure in the buffer manager, however.

There isn't much besides seqscans that can have that behaviour afaics:
- (bitmap)indexscans et al. won't point to completely empty pages
- there cannot be a concurrent vacuum since we have the appropriate
   locks
- if a trigger or something else has a tid referencing a page there need
   to be unremovable tuples on it.

The only thing that I immediately see are tidscans which should be
handleable in a similar manner to seqscans.

Sure, there are some callsites that need to be adapted but it still
seems noticeably easier than what you proposed upthread.

Yeah. I'll think some more how the required buffer manager changes could be done.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to