On Wed, Jun 15, 2016 at 9:56 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Jun 15, 2016 at 2:41 AM, Thomas Munro > <thomas.mu...@enterprisedb.com> wrote: >> I spent some time chasing down the exact circumstances. I suspect >> that there may be an interlocking problem in heap_update. Using the >> line numbers from cae1c788 [1], I see the following interaction >> between the VACUUM, UPDATE and SELECT (pg_check_visible) backends, all >> in reference to the same block number: >> >> [VACUUM] sets all visible bit >> >> [UPDATE] heapam.c:3931 HeapTupleHeaderSetXmax(oldtup.t_data, >> xmax_old_tuple); >> [UPDATE] heapam.c:3938 LockBuffer(buffer, BUFFER_LOCK_UNLOCK); >> >> [SELECT] LockBuffer(buffer, BUFFER_LOCK_SHARE); >> [SELECT] observes VM_ALL_VISIBLE as true >> [SELECT] observes tuple in HEAPTUPLE_DELETE_IN_PROGRESS state >> [SELECT] barfs >> >> [UPDATE] heapam.c:4116 visibilitymap_clear(...) > > Yikes: heap_update() sets the tuple's XMAX, CMAX, infomask, infomask2, > and CTID without logging anything or clearing the all-visible flag and > then releases the lock on the heap page to go do some more work that > might even ERROR out. Only if that other work all goes OK do we > relock the page and perform the WAL-logged actions. > > That doesn't seem like a good idea even in existing releases, because > you've taken a tuple on an all-visible page and made it not > all-visible, and you've made a page modification that is not > necessarily atomic without logging it. This is is particularly bad in > 9.6, because if that page is also all-frozen then XMAX will eventually > be pointing into space and VACUUM will never visit the page to > re-freeze it the way it would have done in earlier releases. However, > even in older releases, I think there's a remote possibility of data > corruption. Backend #1 makes these changes to the page, releases the > lock, and errors out. Backend #2 writes the page to the OS. DBA > takes a hot backup, tearing the page in the middle of XMAX. Oops. > > I'm not sure what to do about this: this part of the heap_update() > logic has been like this forever, and I assume that if it were easy to > refactor this away, somebody would have done it by now. >
How about changing collect_corrupt_items to acquire AccessExclusiveLock for safely checking? Regards, -- Masahiko Sawada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers