On Thu, Nov 9, 2017 at 2:24 PM, Andres Freund <and...@anarazel.de> wrote: > Attached is a version of the already existing regression test that both > reproduces the broken hot chain (and thus failing index lookups) and > then also the tuple reviving. I don't see any need for letting this run > with arbitrary permutations.
I thought that the use of every possible permutation was excessive, myself. It left us with an isolation test that didn't precisely describe the behavior that is tested. What you came up with seems far, far better, especially because of the comments you included. The mail message-id references seem to add a lot, too. > What I'm currently wondering about is how much we need to harden > postgres against such existing corruption. If e.g. the hot chains are > broken somebody might have reindexed thinking the problem is fixed - but > if they then later vacuum everything goes to shit again, with dead rows > reappearing. I don't follow you here. Why would REINDEXing make the rows that should be dead disappear again, even for a short period of time? It might do so for index scans, I suppose, but not for sequential scans. Are you concerned about a risk of somebody not noticing that sequential scans are still broken? Actually, on second thought, I take that back -- I don't think that REINDEXing will even finish once a HOT chain is broken by the bug. IndexBuildHeapScan() actually does quite a good job of making sure that HOT chains are sane, which is how the enhanced amcheck notices the bug here in practice. (Before this bug was discovered, I would have expected amcheck to catch problems like it slightly later, during the Bloom filter probe for that HOT chain...but, in fact, it never gets there with corruption from this bug in practice, AFAIK.) -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers