On Fri, Nov 5, 2021 at 7:51 PM Peter Geoghegan <p...@bowt.ie> wrote: > Here are some specific checks I have in mind:
One more for the list: * Validate PageIsAllVisible() for each page. In other words, pg_visibility should be merged with verify_heapam.c (or at least pg_visibility 's pg_check_frozen() and pg_check_visible() functions should be moved, merged, or whatever). This would mean that verify_heapam() would directly check if the page-level PD_ALL_VISIBLE flag contradicts either the tuple headers of tuples with storage on the page, the presence (or absence) of LP_DEAD stub line pointers on the page, or the corresponding visibility map bit (e.g., VISIBILITYMAP_ALL_VISIBLE) for the page. There is value in teaching verify_heapam() about any possible problem, including with the visibility map, but it's certainly less valuable than the HOT chain verification stuff -- and probably trickier to get right. I'm mentioning it now to be exhaustive, but it's less of a priority for me personally. I am quite willing to help out with all this, if you're interested. One more thing about HOT chain validation: I can give you another example bug of the kind I'd expect verify_heapam() to catch only with full HOT chain validation. This one is a vintage MultiXact bug that has the same basic HOT chain corruption, looks-like-index-corruption-but-isn't quality as the more memorable freeze-the-dead bug (this one was fixed by commit 6bfa88ac): https://www.postgresql.org/message-id/CAM3SWZTMQiCi5PV5OWHb%2BbYkUcnCk%3DO67w0cSswPvV7XfUcU5g%40mail.gmail.com In general I think that reviewing historic examples of pernicious corruption bugs is a valuable exercise when designing tools like amcheck. Maybe even revert the fix during testing, to be sure it would have been caught had the final tool been available. History doesn't repeat itself, but it does rhyme. -- Peter Geoghegan