On Tue, Aug 25, 2020 at 10:36 AM Mark Dilger <mark.dil...@enterprisedb.com> wrote: > Thanks for the review!
+ msg OUT text + ) Looks like atypical formatting. +REVOKE ALL ON FUNCTION +verify_heapam(regclass, boolean, boolean, cstring, bigint, bigint) +FROM PUBLIC; This too. +-- Don't want this to be available to public Add "by default, but superusers can grant access" or so? I think there should be a call to pg_class_aclcheck() here, just like the one in pg_prewarm, so that if the superuser does choose to grant access, users given access can check tables they anyway have permission to access, but not others. Maybe put that in check_relation_relkind_and_relam() and rename it. Might want to look at the pg_surgery precedent, too. Oh, and that functions header comment is also wrong. I think that the way the checks on the block range are performed could be improved. Generally, we want to avoid reporting the same problem with a variety of different message strings, because it adds burden for translators and is potentially confusing for users. You've got two message strings that are only going to be used for empty relations and a third message string that is only going to be used for non-empty relations. What stops you from just ripping off the way that this is done in pg_prewarm, which requires only 2 messages? Then you'd be adding a net total of 0 new messages instead of 3, and in my view they would be clearer than your third message, "block range is out of bounds for relation with block count %u: " INT64_FORMAT " .. " INT64_FORMAT, which doesn't say very precisely what the problem is, and also falls afoul of our usual practice of avoiding the use of INT64_FORMAT in error messages that are subject to translation. I notice that pg_prewarm just silently does nothing if the start and end blocks are swapped, rather than generating an error. We could choose to do differently here, but I'm not sure why we should bother. + all_frozen = mapbits & VISIBILITYMAP_ALL_VISIBLE; + all_visible = mapbits & VISIBILITYMAP_ALL_FROZEN; + + if ((all_frozen && skip_option == SKIP_PAGES_ALL_FROZEN) || + (all_visible && skip_option == SKIP_PAGES_ALL_VISIBLE)) + { + continue; + } This isn't horrible style, but why not just get rid of the local variables? e.g. if (skip_option == SKIP_PAGES_ALL_FROZEN) { if ((mapbits & VISIBILITYMAP_ALL_FROZEN) != 0) continue; } else { ... } Typically no braces around a block containing only one line. + * table contains corrupt all frozen bits, a concurrent vacuum might skip the all-frozen? + * relfrozenxid beyond xid.) Reporting the xid as valid under such conditions + * seems acceptable, since if we had checked it earlier in our scan it would + * have truly been valid at that time, and we break no MVCC guarantees by + * failing to notice the concurrent change in its status. I agree with the first half of this sentence, but I don't know what MVCC guarantees have to do with anything. I'd just delete the second part, or make it a lot clearer. + * Some kinds of tuple header corruption make it unsafe to check the tuple + * attributes, for example when the tuple is foreshortened and such checks + * would read beyond the end of the line pointer (and perhaps the page). In I think of foreshortening mostly as an art term, though I guess it has other meanings. Maybe it would be clearer to say something like "Some kinds of corruption make it unsafe to check the tuple attributes, for example when the line pointer refers to a range of bytes outside the page"? + * Other kinds of tuple header corruption do not bare on the question of bear + pstrdup(_("updating transaction ID marked incompatibly as keys updated and locked only"))); + pstrdup(_("updating transaction ID marked incompatibly as committed and as a multitransaction ID"))); "updating transaction ID" might scare somebody who thinks that you are telling them that you changed something. That's not what it means, but it might not be totally clear. Maybe: tuple is marked as only locked, but also claims key columns were updated multixact should not be marked committed + psprintf(_("data offset differs from expected: %u vs. %u (1 attribute, has nulls)"), For these, how about: tuple data should begin at byte %u, but actually begins at byte %u (1 attribute, has nulls) etc. + psprintf(_("old-style VACUUM FULL transaction ID is in the future: %u"), + psprintf(_("old-style VACUUM FULL transaction ID precedes freeze threshold: %u"), + psprintf(_("old-style VACUUM FULL transaction ID is invalid in this relation: %u"), old-style VACUUM FULL transaction ID %u is in the future old-style VACUUM FULL transaction ID %u precedes freeze threshold %u old-style VACUUM FULL transaction ID %u out of range %u..%u Doesn't the second of these overlap with the third? Similarly in other places, e.g. + psprintf(_("inserting transaction ID is in the future: %u"), I think this should change to: inserting transaction ID %u is in the future + else if (VARATT_IS_SHORT(chunk)) + /* + * could happen due to heap_form_tuple doing its thing + */ + chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT; Add braces here, since there are multiple lines. + psprintf(_("toast chunk sequence number not the expected sequence number: %u vs. %u"), toast chunk sequence number %u does not match expected sequence number %u There are more instances of this kind of thing. + psprintf(_("toasted attribute has unexpected TOAST tag: %u"), Remove colon. + psprintf(_("attribute ends at offset beyond total tuple length: %u vs. %u (attribute length %u)"), Let's try to specify the attribute number in the attribute messages where we can, e.g. + psprintf(_("attribute ends at offset beyond total tuple length: %u vs. %u (attribute length %u)"), How about: attribute %u with length %u should end at offset %u, but the tuple length is only %u + if (TransactionIdIsNormal(ctx->relfrozenxid) && + TransactionIdPrecedes(xmin, ctx->relfrozenxid)) + { + report_corruption(ctx, + /* translator: Both %u are transaction IDs. */ + psprintf(_("inserting transaction ID is from before freeze cutoff: %u vs. %u"), + xmin, ctx->relfrozenxid)); + fatal = true; + } + else if (!xid_valid_in_rel(xmin, ctx)) + { + report_corruption(ctx, + /* translator: %u is a transaction ID. */ + psprintf(_("inserting transaction ID is in the future: %u"), + xmin)); + fatal = true; + } This seems like good evidence that xid_valid_in_rel needs some rethinking. As far as I can see, every place where you call xid_valid_in_rel, you have checks beforehand that duplicate some of what it does, so that you can give a more accurate error message. That's not good. Either the message should be adjusted so that it covers all the cases "e.g. tuple xmin %u is outside acceptable range %u..%u" or we should just get rid of xid_valid_in_rel() and have separate error messages for each case, e.g. tuple xmin %u precedes relfrozenxid %u". I think it's OK to use terms like xmin and xmax in these messages, rather than inserting transaction ID etc. We have existing instances of that, and while someone might judge it user-unfriendly, I disagree. A person who is qualified to interpret this output must know what 'tuplex min' means immediately, but whether they can understand that 'inserting transaction ID' means the same thing is questionable, I think. This is not a full review, but in general I think that this is getting pretty close to being committable. The error messages seem to still need some polishing and I wouldn't be surprised if there are a few more bugs lurking yet, but I think it's come a long way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company