On Wed, Jun 6, 2012 at 1:46 PM, Andres Freund <and...@2ndquadrant.com> wrote:
> On a cursory lock it might just be a race condition in
> vacuumlazy.c:lazy_scan_heap. If scan_all is set, which it has to be for the
> warning to be visible, all_visible_according_to_vm is determined before we
> loop over all blocks. At the point where one specific heap block is actually
> read and locked that knowledge might be completely outdated by any concurrent
> backend. Am I missing something?

No, I think you're right.  I think that warning is bogus.  I added it
in place of some older warning which no longer made sense, but I think
this one doesn't make sense either.

> I have to say the whole visibilitymap correctness and crash-safety seems to be
> quite under documented, especially as it seems to be somewhat intricate (to
> me). E.g. not having any note why visibilitymap_test doesn't need locking. (I
> guess the theory is that a 1 byte read will always be consistent. But how does
> that ensure other backends see an up2date value?).

It's definitely intricate, and it's very possible that we should have
some more documentation.  I am not sure exactly what and where, but
feel free to suggest something.

visibilitymap_test() does have a comment saying that:

        /*
         * We don't need to lock the page, as we're only looking at a
single bit.
         */

But that's a bit unsatisfying, because, as you say, it doesn't address
the question of memory-ordering issues.  I think that there's no
situation in which it causes a problem to see the visibility map bit
as unset when in reality it has just recently been set by some other
back-end.  It would be bad if someone did something like:

if (visibilitymap_test(...))
    visibilitymap_clear();

...because then memory-ordering issues could cause us to accidentally
fail to clear the bit.   No one should be doing that, though; the
relevant locking and conditional logic is built directly into
visibilitymap_clear().

On the flip side, if someone sees the visibility map bit as set when
it's actually just been cleared, that could cause a problem - most
seriously, index-only scans could return wrong answers.  For that to
happen, someone would have to insert a heap tuple onto a previously
all-visible page, clearing the visibility map bit, and then insert an
index tuple; concurrently, some other backend would need to see the
index tuple but not the fact that the visibility map bit had been
cleared.  I don't think that can happen: after inserting the heap
tuple, the inserting backend would release buffer content lock, which
acts as a full memory barrier; before reading the index tuple, the
index-only-scanning backend would have to take the content lock on the
index buffer, which also acts as a full memory barrier.  So the
inserter can't do the writes out of order, and the index-only-scanner
can't do the reads out of order, so I think it's safe.... but we
probably do need to explain that somewhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to