On Mon, May 9, 2011 at 10:25 PM, Merlin Moncure <mmonc...@gmail.com> wrote: > On Fri, May 6, 2011 at 5:47 PM, Robert Haas <robertmh...@gmail.com> wrote: >> On Wed, Mar 30, 2011 at 8:52 AM, Heikki Linnakangas >> <heikki.linnakan...@enterprisedb.com> wrote: >>>> Another question: >>>> To address the problem in >>>> http://archives.postgresql.org/pgsql-hackers/2010-02/msg02097.php >>>> , should we just clear the vm before the log of insert/update/delete? >>>> This may reduce the performance, is there another solution? >>> >>> Yeah, that's a straightforward way to fix it. I don't think the performance >>> hit will be too bad. But we need to be careful not to hold locks while doing >>> I/O, which might require some rearrangement of the code. We might want to do >>> a similar dance that we do in vacuum, and call visibilitymap_pin first, then >>> lock and update the heap page, and then set the VM bit while holding the >>> lock on the heap page. >> >> Here's an attempt at implementing the necessary gymnastics. > > Is there a quick synopsis of why you have to do (sometimes) the > pin->lock->unlock->pin->lock mechanic? How come you only can fail to > get the pin at most once?
I thought I'd explained it fairly thoroughly in the comments, but evidently not. Suggestions for improvement are welcome. Here goes in more detail: Every time we insert, update, or delete a tuple in a particular heap page, we must check whether the page is marked all-visible. If it is, then we need to clear the page-level bit marking it as all-visible, and also the corresponding page in the visibility map. On the other hand, if the page isn't marked all-visible, then we needn't touch the visibility map at all. So, there are either one or two buffers involved: the buffer containing the heap page ("buffer") and possibly also a buffer containing the visibility map page in which the bit for the heap page is to be found ("vmbuffer"). Before taking an exclusive content-lock on the heap buffer, we check whether the page appears to be all-visible. If it does, then we pin the visibility map page and then lock the buffer. If not, we just lock the buffer. However, since we weren't holding any lock, it's possible that between the time when we checked the visibility map bit and the time when we obtained the exclusive buffer-lock, the visibility map bit might have changed from clear to set (because someone is concurrently running VACUUM on the table; or on platforms with weak memory-ordering, someone was running VACUUM "almost" concurrently). If that happens, we give up our buffer lock, go pin the visibility map page, and reacquire the buffer lock. At this point in the process, we know that *if* the page is marked all-visible, *then* we have the appropriate visibility map page pinned. There are three possible pathways: (1) If the buffer initially appeared to be all-visible, we will have pinned the visibility map page before acquiring the exclusive lock; (2) If the buffer initially appeared NOT to be all-visible, but by the time we obtained the exclusive lock it now appeared to be all-visible, then we will have done the unfortunate unlock-pin-relock dance, and the visibility map page will now be pinned; (3) if the buffer initially appeared NOT to be all-visible, and by the time we obtained the exclusive lock it STILL appeared NOT to be all-visible, then we don't have the visibility map page pinned - but that's OK, because in this case no operation on the visibility map needs to be performed. Now it is very possible that in case (1) or (2) the visibility map bit, though we saw it set at some point, will actually have been cleared in the meantime. In case (1), this could happen before we obtain the exclusive lock; while in case (2), it could happen after we give up the lock to go pin the visibility map page and before we reacquire it. This will typically happen when a buffer has been sitting around for a while in an all-visible state and suddenly two different backends both try to update or delete tuples in that buffer at almost exactly the same time. But it causes no great harm - both backends will pin the visibility map page, whichever one gets the exclusive lock on the heap page first will clear it, and when the other backend gets the heap page afterwards, it will see that the bit has already been cleared and do nothing further. We've wasted the effort of pinning and unpinning the visibility map page when it wasn't really necessary, but that's not the end of the world. We could avoid all of this complexity - and the possibility of pinning the visibility map page needlessly - by locking the heap buffer first and then pinning the visibility map page if the heap page is all-visible. However, that would involve holding the lock on the heap buffer across a possible disk I/O to bring the visibility map page into memory, which is something the existing code tries pretty hard to avoid. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers