lazy_vacuum_page() does this:

1. START_CRIT_SECTION()
2. Remove dead tuples from the page, marking the itemid's unused.
3. MarkBufferDirty
4. if all remaining tuples on the page are visible to everyone, set the all-visible flag on the page, and call visibilitymap_set() to set the VM bit. 5 visibilitymap_set() writes a WAL record about setting the bit in the visibility map.
6. write the WAL record of removing the dead tuples.
7. END_CRIT_SECTION().

See the problem? Setting the VM bit is WAL-logged first, before the removal of the tuples. If you stop recovery between the two WAL records, the page's VM bit in the VM map will be set, but the dead tuples are still on the page.

This bug was introduced in Feb 2013, by commit fdf9e21196a6f58c6021c967dc5776a16190f295, so it's present in 9.3 and master.

The fix seems quite straightforward, we just have to change the order of those two records. I'll go commit that.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to