Re: new heapcheck contrib module
On Tue, Mar 2, 2021 at 1:24 PM Robert Haas wrote: > Other than that 0001 looks to me to be in pretty good shape now. Incidentally, we might want to move this to a new thread with a better subject line, since the current subject line really doesn't describe the uncommitted portion of the work. And create a new CF entry, too. Moving onto 0002: The index checking options should really be called btree index checking options. I think I'd put the table options first, and the btree options second. Other kinds of indexes could follow some day. I would personally omit the short forms of --heapallindexed and --parent-check; I think we'll run out of option names too quickly if people add more kinds of checks. Perhaps VerifyBtreeSlotHandler should emit a warning of some kind if PQntuples(res) != 0. + /* +* Test that this function works, but for now we're not using the list +* 'relations' that it builds. +*/ + conn = connectDatabase(&cparams, progname, opts.echo, false, true); This comment appears to have nothing to do with the code, since connectDatabase() does not build a list of 'relations'. amcheck_sql seems to include paranoia, but do we need that if we're using a secure search path? Similarly for other SQL queries, e.g. in prepare_table_command. It might not be strictly necessary for the static functions in pg_amcheck.c to use_three completelyDifferent NamingConventions for its static functions. should_processing_continue() is one semicolon over budget. The initializer for opts puts a comma even after the last member initializer. Is that going to be portable to all compilers? + for (failed = false, cell = opts.include.head; cell; cell = cell->next) I think failed has to be false here, because it gets initialized at the top of the function. If we need to reinitialize it for some reason, I would prefer you do that on the previous line, separate from the for loop stuff. + char *dbrgx; /* Database regexp parsed from pattern, or +* NULL */ + char *nsprgx; /* Schema regexp parsed from pattern, or NULL */ + char *relrgx; /* Relation regexp parsed from pattern, or +* NULL */ + booltblonly;/* true if relrgx should only match tables */ + boolidxonly;/* true if relrgx should only match indexes */ Maybe: db_regex, nsp_regex, rel_regex, table_only, index_only? Just because it seems theoretically possible that someone will see nsprgx and not immediately understand what it's supposed to mean, even if they know that nsp is a common abbreviation for namespace in PostgreSQL code, and even if they also know what a regular expression is. Your four messages about there being nothing to check seem like they could be consolidated down to one: "nothing to check for pattern \"%s\"". I would favor changing things so that once argument parsing is complete, we switch to reporting all errors that way. So in other words here, and everything that follows: + fprintf(stderr, "%s: no databases to check\n", progname); +* ParallelSlots based event loop follows. "Main event loop." To me it would read slightly better to change each reference to "relations list" to "list of relations", but perhaps that is too nitpicky. I think the two instances of goto finish could be avoided with not much work. At most a few things need to happen only if !failed, and maybe not even that, if you just said "break;" instead. + * Note: Heap relation corruption is returned by verify_heapam() without the + * use of raising errors, but running verify_heapam() on a corrupted table may How about "Heap relation corruption() is reported by verify_heapam() via the result set, rather than an ERROR, ..." It seems mighty inefficient to have a whole bunch of consecutive calls to remove_relation_file() or corrupt_first_page() when every such call stops and restarts the database. I would guess these tests will run noticeably faster if you don't do that. Either the functions need to take a list of arguments, or the stop/start needs to be pulled up and done in the caller. corrupt_first_page() could use a comment explaining what exactly we're overwriting, and in particular noting that we don't want to just clobber the LSN, but rather something where we can detect a wrong value. There's a long list of calls to command_checks_all() in 003_check.pl that don't actually check anything but that the command failed, but run it with a bunch of different options. I don't understand the value of that, and suggest reducing the number of cases tested. If you want, you can have tests elsewhere that focus -- perhaps by using verbose mode -- on checking that the right tables are being checked. This is not yet a full review of everything in this patch -- I haven't sorted through all of the tests yet, or all of the new query constru
Re: new heapcheck contrib module
On Tue, Mar 2, 2021 at 12:10 PM Mark Dilger wrote: > On further reflection, I decided to implement these changes and not worry > about the behavioral change. Thanks. > I skipped this part. The initcmd argument is only handed to > ParallelSlotsGetIdle(). Doing as you suggest would not really be simpler, it > would just move that argument to ParallelSlotsSetup(). But I don't feel > strongly about it, so I can move this, too, if you like. > > I didn't do this either, and for the same reason. It's just a parameter to > ParallelSlotsGetIdle(), so nothing is really gained by moving it to > ParallelSlotsSetup(). OK. I thought it was more natural to pass a bunch of arguments at setup time rather than passing a bunch of arguments at get-idle time, but I don't feel strongly enough about it to insist, and somebody else can always change it later if they decide I had the right idea. > Rather than the slots user tweak the slot's ConnParams, > ParallelSlotsGetIdle() takes a dbname argument, and uses it as > ConnParams->override_dbname. OK, but you forgot to update the comments. ParallelSlotsGetIdle() still talks about a cparams argument that it no longer has. The usual idiom for sizing a memory allocation involving FLEXIBLE_ARRAY_MEMBER is something like offsetof(ParallelSlotArray, slots) + numslots * sizeof(ParallelSlot). Your version uses sizeof(); don't. Other than that 0001 looks to me to be in pretty good shape now. -- Robert Haas EDB: http://www.enterprisedb.com
Re: new heapcheck contrib module
> On Mar 1, 2021, at 1:14 PM, Robert Haas wrote: > > On Wed, Feb 24, 2021 at 1:55 PM Mark Dilger > wrote: >> [ new patches ] > > Regarding 0001: > > There seem to be whitespace-only changes to the comment for select_loop(). > > I wonder if the ParallelSlotsSetupOneDB()/ParallelSlotsSetupMinimal() > changes could be simpler. First idea: Suppose you had > ParallelSlotsSetup(numslots) that just creates the slot array with 0 > connections, and then ParallelSlotsAdopt(slots, conn, cparams) if you > want to make it own an existing connection. That seems like it might > be cleaner. Second idea: Why not get rid of ParallelSlotsSetupOneDB() > altogether, and just let ParallelSlotsGetIdle() connect the other > slots as required? Preconnecting all slots before we do anything is > good because ... of what? Mostly because, if --jobs is set too high, you get an error before launching any work. I don't know that it's really a big deal if vacuumdb or reindexdb have a bunch of tasks kicked off prior to exit(1) due to not being able to open connections for all the slots, but it is a behavioral change. > I also wonder if things might be simplified by introducing a wrapper > object, e.g. ParallelSlotArray. Suppose a ParallelSlotArray stores the > number of slots (num_slots), the array of actual PGconn objects, and > the ConnParams to be used for new connections, and the initcmd to be > used for new connections. Maybe also the progname. This seems like it > would avoid a bunch of repeated parameter passing: you could just > create the ParallelSlotArray with the right contents, and then pass it > around everywhere, instead of having to keep passing the same stuff > in. If you want to switch to connecting to a different DB, you tweak > the ConnParams - maybe using an accessor function - and the system > figures the rest out. The existing version of parallel slots (before any of my changes) could already have been written that way, but the author chose not to. I thought about making the sort of change you suggest, and decided against, mostly on the principle of stare decisis. But the idea is very appealing, and since you're on board, I think I'll go make that change. > I wonder if it's really useful to generalize this to a point of caring > about all the ConnParams fields, too. Like, if you only provide > ParallelSlotUpdateDB(slotarray, dbname), then that's the only field > that can change so you don't need to care about the others. And maybe > you also don't really need to keep the ConnParams fields in every > slot, either. Like, couldn't you just say something like: if > (strcmp(PQdb(conn) , slotarray->cparams->dbname) != 0) { wrong DB, > can't reuse without a reconnect }? I know sometimes a dbname is really > a whole connection string, but perhaps we could try to fix that by > using PQconninfoParse() in the right place, so that what ends up in > the cparams is just the db name, not a whole connection string. I went a little out of my way to avoid that, as I didn't want the next application that uses parallel slots to have to refactor it again, if for example they want to process in parallel databases listening on different ports, or to process commands issued under different roles. > This is just based on a relatively short amount of time spent studying > the patch, so I might well be off-base here. What do you think? I like the ParallelSlotArray idea, and will go do that now. I'm happy to defer to your judgement on the other stuff, too, but will wait to hear back from you. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
On Wed, Feb 24, 2021 at 1:55 PM Mark Dilger wrote: > [ new patches ] Regarding 0001: There seem to be whitespace-only changes to the comment for select_loop(). I wonder if the ParallelSlotsSetupOneDB()/ParallelSlotsSetupMinimal() changes could be simpler. First idea: Suppose you had ParallelSlotsSetup(numslots) that just creates the slot array with 0 connections, and then ParallelSlotsAdopt(slots, conn, cparams) if you want to make it own an existing connection. That seems like it might be cleaner. Second idea: Why not get rid of ParallelSlotsSetupOneDB() altogether, and just let ParallelSlotsGetIdle() connect the other slots as required? Preconnecting all slots before we do anything is good because ... of what? I also wonder if things might be simplified by introducing a wrapper object, e.g. ParallelSlotArray. Suppose a ParallelSlotArray stores the number of slots (num_slots), the array of actual PGconn objects, and the ConnParams to be used for new connections, and the initcmd to be used for new connections. Maybe also the progname. This seems like it would avoid a bunch of repeated parameter passing: you could just create the ParallelSlotArray with the right contents, and then pass it around everywhere, instead of having to keep passing the same stuff in. If you want to switch to connecting to a different DB, you tweak the ConnParams - maybe using an accessor function - and the system figures the rest out. I wonder if it's really useful to generalize this to a point of caring about all the ConnParams fields, too. Like, if you only provide ParallelSlotUpdateDB(slotarray, dbname), then that's the only field that can change so you don't need to care about the others. And maybe you also don't really need to keep the ConnParams fields in every slot, either. Like, couldn't you just say something like: if (strcmp(PQdb(conn) , slotarray->cparams->dbname) != 0) { wrong DB, can't reuse without a reconnect }? I know sometimes a dbname is really a whole connection string, but perhaps we could try to fix that by using PQconninfoParse() in the right place, so that what ends up in the cparams is just the db name, not a whole connection string. This is just based on a relatively short amount of time spent studying the patch, so I might well be off-base here. What do you think? -- Robert Haas EDB: http://www.enterprisedb.com
Re: new heapcheck contrib module
On Tue, Feb 23, 2021 at 12:38 PM Mark Dilger wrote: > This is changed in v40 as you propose to exit on FATAL and PANIC level errors > and on error to send a query. On lesser errors (which includes all > corruption reports about btrees and some heap corruption related errors), the > slot's connection is still useable, I think. Are there cases where the error > is lower than FATAL and yet the connection needs to be reestablished? It > does not seem so from the testing I have done, but perhaps I'm not thinking > of the right sort of non-fatal error? I think you should assume that if you get an ERROR you can - and should - continue to use the connection, but still exit non-zero at the end. Perhaps one can contrive some scenario where that's not the case, but if the server does the equivalent of "ERROR: session permanently borked" we should really change those to FATAL; I think you can discount that possibility. > In v40, exit(1) means the program encountered fatal errors leading it to > stop, and exit(2) means that a non-fatal error and/or corruption reports > occurred somewhere during the processing. Otherwise, exit(0) means your > database was successfully checked and is healthy. wfm. -- Robert Haas EDB: http://www.enterprisedb.com
Re: new heapcheck contrib module
> On Feb 17, 2021, at 12:56 PM, Robert Haas wrote: > > On Wed, Feb 17, 2021 at 1:46 PM Mark Dilger > wrote: >> It will reconnect and retry a command one time on error. That should cover >> the case that the connection to the database was merely lost. If the second >> attempt also fails, no further retry of the same command is attempted, >> though commands for remaining relation targets will still be attempted, both >> for the database that had the error and for other remaining databases in the >> list. >> >> Assuming something is wrong with "db2", the command `pg_amcheck db1 db2 db3` >> could result in two failures per relation in db2 before finally moving on to >> db3. That seems pretty awful considering how many relations that could be, >> but failing to soldier on in the face of errors seems a strange design for a >> corruption checking tool. > > That doesn't seem right at all. I think a PQsendQuery() failure is so > remote that it's probably justification for giving up on the entire > operation. If it's caused by a problem with some object, it probably > means that accessing that object caused the whole database to go down, > and retrying the object will take the database down again. Retrying > the object is betting that the user interrupted connectivity between > pg_amcheck and the database but the interruption is only momentary and > the user actually wants to complete the operation. That seems unlikely > to me. I think it's far more probably that the database crashed or got > shut down and continuing is futile. > > My proposal is: if we get an ERROR trying to *run* a query, give up on > that object but still try the other ones after reconnecting. If we get > a FATAL or PANIC trying to *run* a query, give up on the entire > operation. If even sending a query fails, also give up. This is changed in v40 as you propose to exit on FATAL and PANIC level errors and on error to send a query. On lesser errors (which includes all corruption reports about btrees and some heap corruption related errors), the slot's connection is still useable, I think. Are there cases where the error is lower than FATAL and yet the connection needs to be reestablished? It does not seem so from the testing I have done, but perhaps I'm not thinking of the right sort of non-fatal error? (I'll wait to post v40 until after hearing your thoughts on this.) >> In v39, exit(1) is used for all errors which are intended to stop the >> program. It is important to recognize that finding corruption is not an >> error in this sense. A query to verify_heapam() can fail if the relation's >> checksums are bad, and that happens beyond verify_heapam()'s control when >> the page is not allowed into the buffers. There can be errors if the file >> backing a relation is missing. There may be other corruption error cases >> that I have not yet thought about. The connections' errors get reported to >> the user, but pg_amcheck does not exit as a consequence of them. As >> discussed above, failing to send the query to the server is not viewed as a >> reason to exit, either. It would be hard to quantify all the failure modes, >> but presumably the catalogs for a database could be messed up enough to >> cause such failures, and I'm not sure that pg_amcheck should just abort. > > I agree that exit(1) should happen after any error intended to stop > the program. But I think it should also happen at the end of the run > if we hit any problems for which we did not stop, so that exit(0) > means your database is healthy. In v40, exit(1) means the program encountered fatal errors leading it to stop, and exit(2) means that a non-fatal error and/or corruption reports occurred somewhere during the processing. Otherwise, exit(0) means your database was successfully checked and is healthy. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
On Wed, Feb 17, 2021 at 1:46 PM Mark Dilger wrote: > Reworking the code took a while. Version 39 patches attached. Regarding the documentation, I think the Usage section at the top is far too extensive and duplicates the option description section to far too great an extent. You have 21 usage examples for a command with 34 options. Even if we think it's a good idea to give a brief summary of usage, it's got to be brief; we certainly don't need examples of obscure special-purpose options like --maintenance-db here. Looking through the commands in "PostgreSQL Client Applications" and "Additional Supplied Programs," most of them just have a synopsis section and nothing like this Usage section. Those that do have a Usage section typically use it for a narrative description of what to do with the tool (e.g. see pg_test_timing), not a long list of examples. I'm inclined to think you should nuke all the examples and incorporate the descriptive text, to the extent that it's needed, either into the descriptions of the individual options or, if the behavior spans many options, into the Description section. A few of these examples could move down into an Examples section at the bottom, perhaps, but I think 21 is still too many. I'd try to limit it to 5-7. Just hit the highlights. I also think that perhaps it's not best to break up the list of options into so many different categories the way you have. Notice that for example pg_dump and psql don't do this, instead putting everything into one ordered list, despite also having a lot of options. This is arguably worse if you want to understand which options are related to each other, but it's better if you are just looking for something based on alphabetical order. -- Robert Haas EDB: http://www.enterprisedb.com
Re: new heapcheck contrib module
On Wed, Feb 17, 2021 at 1:46 PM Mark Dilger wrote: > It will reconnect and retry a command one time on error. That should cover > the case that the connection to the database was merely lost. If the second > attempt also fails, no further retry of the same command is attempted, though > commands for remaining relation targets will still be attempted, both for the > database that had the error and for other remaining databases in the list. > > Assuming something is wrong with "db2", the command `pg_amcheck db1 db2 db3` > could result in two failures per relation in db2 before finally moving on to > db3. That seems pretty awful considering how many relations that could be, > but failing to soldier on in the face of errors seems a strange design for a > corruption checking tool. That doesn't seem right at all. I think a PQsendQuery() failure is so remote that it's probably justification for giving up on the entire operation. If it's caused by a problem with some object, it probably means that accessing that object caused the whole database to go down, and retrying the object will take the database down again. Retrying the object is betting that the user interrupted connectivity between pg_amcheck and the database but the interruption is only momentary and the user actually wants to complete the operation. That seems unlikely to me. I think it's far more probably that the database crashed or got shut down and continuing is futile. My proposal is: if we get an ERROR trying to *run* a query, give up on that object but still try the other ones after reconnecting. If we get a FATAL or PANIC trying to *run* a query, give up on the entire operation. If even sending a query fails, also give up. > In v39, exit(1) is used for all errors which are intended to stop the > program. It is important to recognize that finding corruption is not an > error in this sense. A query to verify_heapam() can fail if the relation's > checksums are bad, and that happens beyond verify_heapam()'s control when the > page is not allowed into the buffers. There can be errors if the file > backing a relation is missing. There may be other corruption error cases > that I have not yet thought about. The connections' errors get reported to > the user, but pg_amcheck does not exit as a consequence of them. As > discussed above, failing to send the query to the server is not viewed as a > reason to exit, either. It would be hard to quantify all the failure modes, > but presumably the catalogs for a database could be messed up enough to cause > such failures, and I'm not sure that pg_amcheck should just abort. I agree that exit(1) should happen after any error intended to stop the program. But I think it should also happen at the end of the run if we hit any problems for which we did not stop, so that exit(0) means your database is healthy. -- Robert Haas EDB: http://www.enterprisedb.com
Re: new heapcheck contrib module
On Thu, Feb 4, 2021 at 11:10 AM Mark Dilger wrote: > Numbered 0001 in this next patch set. Hi, I committed 0001 as you had it and 0002 with some more cleanups. Things I did: - Adjusted some comments. - Changed processQueryResult so that it didn't do foo(bar) with foo being a pointer. Generally we prefer (*foo)(bar) when it can be confused with a direct function call, but wunk->foo(bar) is also considered acceptable. - Changed the return type of ParallelSlotResultHandler to be bool, because having it return PGresult * seemed to offer no advantages. -- Robert Haas EDB: http://www.enterprisedb.com
Re: new heapcheck contrib module
On Thu, Feb 4, 2021 at 11:10 AM Mark Dilger wrote: > I also made changes to clean up 0003 (formerly numbered 0004) "deduplice" is a typo. I'm not sure that I agree with check_each_database()'s commentary about why it doesn't make sense to optimize the resolve-the-databases step. Like, suppose I type 'pg_amcheck sasquatch'. I think the way you have it coded it's going to tell me that there are no databases to check, which might make me think I used the wrong syntax or something. I want it to tell me that sasquatch does not exist. If I happen to be a cryptid believer, I may reject that explanation as inaccurate, but at least there's no question about what pg_amcheck thinks the problem is. Why does check_each_database() go out of its way to run the main query without the always-secure search path? If there's a good reason, I think it deserves a comment saying what the reason is. If there's not a good reason, then I think it should use the always-secure search path for 100% of everything. Same question applies to check_one_database(). ParallelSlotSetHandler(free_slot, VerifyHeapamSlotHandler, sql.data) could stand to be split over two lines, like you do for the nearly run_command() call, so that it doesn't go past 80 columns. I suggest having two variables instead of one for amcheck_schema. Using the same variable to store the unescaped value and then later the escaped value is, IMHO, confusing. Whatever you call the escaped version, I'd rename the function parameters elsewhere to match. "status = PQsendQuery(conn, sql) == 1" seems a bit uptight to me. Why not just make status an int and then just "status = PQsendQuery(conn, sql)" and then test for status != 0? I don't really care if you don't change this, it's not actually important. But personally I'd rather code it as if any non-zero value meant success. I think the pg_log_error() in run_command() could be worded a bit better. I don't think it's a good idea to try to include the type of object in there like this, because of the translatability guidelines around assembling messages from fragments. And I don't think it's good to say that the check failed because the reality is that we weren't able to ask for the check to be run in the first place. I would rather log this as something like "unable to send query: %s". I would also assume we need to bail out entirely if that happens. I'm not totally sure what sorts of things can make PQsendQuery() fail but I bet it boils down to having lost the server connection. Should that occur, trying to send queries for all of the remaining objects is going to result in repeating the same error many times, which isn't going to be what anybody wants. It's unclear to me whether we should give up on the whole operation but I think we have to at least give up on that connection... unless I'm confused about what the failure mode is likely to be here. It looks to me like the user won't be able to tell by the exit code what happened. What I did with pg_verifybackup, and what I suggest we do here, is exit(1) if anything went wrong, either in terms of failing to execute queries or in terms of those queries returning problem reports. With pg_verifybackup, I thought about trying to make it like 0 => backup OK, 2 => backup not OK, 2 => trouble, but I found it too hard to distinguish what should be exit(1) and what should be exit(2) and the coding wasn't trivial either, so I went with the simpler scheme. The opening line of appendDatabaseSelect() could be adjusted to put the regexps parameter on the next line, avoiding awkward wrapping. If they are being run with a safe search path, the queries in appendDatabaseSelect(), appendSchemaSelect(), etc. could be run without all the paranoia. If not, maybe they should be. The casts to text don't include the paranoia: with an unsafe search path, we need pg_catalog.text here. Or no cast at all, which seems like it ought to be fine too. Not quite sure why you are doing all that casting to text; the datatype is presumably 'name' and ought to collate like collate "C" which is probably fine. It would probably be a better idea for appendSchemaSelect to declare a PQExpBuffer and call initPQExpBuffer just once, and then resetPQExpBuffer after each use, and finally termPQExpBuffer just once. The way you have it is not expensive enough to really matter, but avoiding repeated allocate/free cycles is probably best. I wonder if a pattern like .foo.bar ends up meaning the same thing as a pattern like foo.bar, with the empty database name being treated the same as if nothing were specified. >From the way appendTableCTE() is coded, it seems to me that if I ask for tables named j* excluding tables named jam* I still might get toast tables for my jam, which seems wrong. There does not seem to be any clear benefit to defining CT_TABLE = 0 in this case, so I would let the compiler deal with it. We should not be depending on that to have any particular numeric value. Why does pg_amcheck.c have a hea
Re: new heapcheck contrib module
On Tue, Feb 2, 2021 at 6:10 PM Mark Dilger wrote: > 0001 -- no changes Committed. > 0002 -- fixing omissions in @pgfeutilsfiles in file > src/tools/msvc/Mkvcbuild.pm Here are a few minor cosmetic issues with this patch: - connect_utils.c lacks a file header comment. - Some or perhaps all of the other file header comments need an update for 2021. - There's bogus hunks in the diff for string_utils.c. I think the rest of this looks good. I spent a long time puzzling over whether consumeQueryResult() and processQueryResult() needed to be moved, but then I realized that this patch actually makes them into static functions inside parallel_slot.c, rather than public functions as they were before. I like that. The only reason those functions need to be moved at all is so that the scripts_parallel/parallel_slot stuff can continue to do its thing, so this is actually a better way of grouping things together than what we have now. > 0003 -- no changes I think it would be better if there were no handler by default, and failing to set one leads to an assertion failure when we get to the point where one would be called. I don't think I understand the point of renaming processQueryResult and consumeQueryResult. Isn't that just code churn for its own sake? PGresultHandler seems too generic. How about ParallelSlotHandler or ParallelSlotResultHandler? I'm somewhat inclined to propose s/ParallelSlot/ConnectionSlot/g but I guess it's better not to get sucked into renaming things. It's a little strange that we end up with mutators to set the slot's handler and handler context when we elsewhere feel free to monkey with a slot's connection directly, but it's not a perfect world and I can't think of anything I'd like better. -- Robert Haas EDB: http://www.enterprisedb.com
Re: new heapcheck contrib module
> On Jan 28, 2021, at 9:41 AM, Mark Dilger wrote: > > > >> On Jan 28, 2021, at 9:13 AM, Robert Haas wrote: >> >> I like 0007 quite a bit and am inclined to commit it soon, as it >> doesn't depend on the earlier patches. But: >> >> - I think the residual comment in processSQLNamePattern beginning with >> "Note:" could use some wordsmithing to account for the new structure >> of things -- maybe just "this pass" -> "this function". >> - I suggest changing initializations like maxbuf = buf + 2 to maxbuf = >> &buf[2] for clarity. > > Ok, I should be able to get you an updated version of 0007 with those changes > here soon for you to commit. I made those changes, and fixed a bug that would impact the pg_amcheck callers. I'll have to extend the regression test coverage in 0008 since it obviously wasn't caught, but that's not part of this patch since there are no callers that use the dbname.schema.relname format as yet. This is the only patch for v34, since you want to commit it separately. It's renamed as 0001 here v34-0001-Refactoring-processSQLNamePattern.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
> On Jan 28, 2021, at 9:49 AM, Robert Haas wrote: > > On Thu, Jan 28, 2021 at 12:40 PM Mark Dilger > wrote: >>> On Jan 28, 2021, at 9:13 AM, Robert Haas wrote: >>> If I run pg_amcheck --all -j4 do I get a serialization boundary across >>> databases? Like, I have to completely finish db1 before I can go onto >>> db2, even though maybe only one worker is still busy with it? >> >> Yes, you do. That's patterned on reindexdb and vacuumdb. > > Sounds lame, but fair enough. We can leave that problem for another day. Yeah, I agree that it's lame, and should eventually be addressed. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
On Thu, Jan 28, 2021 at 12:40 PM Mark Dilger wrote: > > On Jan 28, 2021, at 9:13 AM, Robert Haas wrote: > > If I run pg_amcheck --all -j4 do I get a serialization boundary across > > databases? Like, I have to completely finish db1 before I can go onto > > db2, even though maybe only one worker is still busy with it? > > Yes, you do. That's patterned on reindexdb and vacuumdb. Sounds lame, but fair enough. We can leave that problem for another day. -- Robert Haas EDB: http://www.enterprisedb.com
Re: new heapcheck contrib module
> On Jan 28, 2021, at 9:13 AM, Robert Haas wrote: > > I like 0007 quite a bit and am inclined to commit it soon, as it > doesn't depend on the earlier patches. But: > > - I think the residual comment in processSQLNamePattern beginning with > "Note:" could use some wordsmithing to account for the new structure > of things -- maybe just "this pass" -> "this function". > - I suggest changing initializations like maxbuf = buf + 2 to maxbuf = > &buf[2] for clarity. Ok, I should be able to get you an updated version of 0007 with those changes here soon for you to commit. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
> On Jan 28, 2021, at 9:13 AM, Robert Haas wrote: > > If I run pg_amcheck --all -j4 do I get a serialization boundary across > databases? Like, I have to completely finish db1 before I can go onto > db2, even though maybe only one worker is still busy with it? Yes, you do. That's patterned on reindexdb and vacuumdb. Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
I like 0007 quite a bit and am inclined to commit it soon, as it doesn't depend on the earlier patches. But: - I think the residual comment in processSQLNamePattern beginning with "Note:" could use some wordsmithing to account for the new structure of things -- maybe just "this pass" -> "this function". - I suggest changing initializations like maxbuf = buf + 2 to maxbuf = &buf[2] for clarity. Regarding 0001: - My preference would be to dump on_exit_nicely_final() and just rely on order of registration. - I'm not entirely sure it's a good ideal to expose something named fatal() like this, because that's a fairly short and general name. On the other hand, it's pretty descriptive and it's not clear why someone including exit_utils.h would want any other definitiion. I guess we can always change it later if it proves to be problematic; it's got a lot of callers and I guess there's no point in churning the code without a clear reason. - I don't quite see why we need this at all. Like, exit_nicely() is a pg_dump-ism. It would make sense to centralize it if we were going to use it for pg_amcheck, but you don't. If you were going to, you'd need to adapt 0003 to use exit_nicely() instead of exit(), but you don't, nor do you add any other new calls to exit_nicely() anywhere, except for one in 0002. That makes the PGresultHandler stuff depend on exit_nicely(), which might be important if you were going to refactor pg_dump to use that abstraction, but you don't. I'm not opposed to the idea of centralized exit processing for frontend utilities; indeed, it seems like a good idea. But this doesn't seem to get us there. AFAICS it just entangles pg_dump with pg_amcheck unnecessarily in a way that doesn't really benefit either of them. Regarding 0002: - I don't think this is separately committable because it adds an abstraction but not any uses of that abstraction to demonstrate that it's actually any good. Perhaps it should just be merged into 0005, and even into parallel_slot.h vs. having its own header. I'm not really sure about that, though - Is this really much of an abstraction layer? Like, how generic can this be when the argument list includes ExecStatusType expected_status and int expected_ntups? - The logic seems to be very similar to some of the stuff that you move around in 0003, like executeQuery() and executeCommand(), but it doesn't get unified. I'm not necessarily saying it should be, but it's weird to do all this refactoring and end up with something that still looks this 0003, 0004, and 0006 look pretty boring; they are just moving code around. Is there any point in splitting the code from 0003 across two files? Maybe it's fine. If I run pg_amcheck --all -j4 do I get a serialization boundary across databases? Like, I have to completely finish db1 before I can go onto db2, even though maybe only one worker is still busy with it? -- Robert Haas EDB: http://www.enterprisedb.com
Re: new heapcheck contrib module
On Mon, Jan 11, 2021 at 1:16 PM Mark Dilger wrote: > Added in v32, along with adding pg_amcheck to @contrib_uselibpq, > @contrib_uselibpgport, and @contrib_uselibpgcommon exit_utils.c fails to achieve the goal of making this code independent of pg_dump, because of: #ifdef WIN32 if (parallel_init_done && GetCurrentThreadId() != mainThreadId) _endthreadex(code); #endif parallel_init_done is a pg_dump-ism. Perhaps this chunk of code could be a handler that gets registered using exit_nicely() rather than hard-coded like this. Note that the function comments for exit_nicely() are heavily implicated in this problem, since they also apply to stuff that only happens in pg_dump and not other utilities. I'm skeptical about the idea of putting functions into string_utils.c with names as generic as include_filter() and exclude_filter(). Existing cases like fmtId() and fmtQualifiedId() are not great either, but I think this is worse and that we should do some renaming. On a related note, it's not clear to me why these should be classified as string_utils while stuff like expand_schema_name_patterns() gets classified as option_utils. These are neither generic string-processing functions nor are they generic options-parsing functions. They are functions for expanding shell-glob style patterns for database object names. And they seem like they ought to be together, because they seem to do closely-related things. I'm open to an argument that this is wrongheaded on my part, but it looks weird to me the way it is. I'm pretty unimpressed by query_utils.c. The CurrentResultHandler stuff looks grotty, and you don't seem to really use it anywhere. And it seems woefully overambitious to me anyway: this doesn't apply to every kind of "result" we've got hanging around, absolutely nothing even close to that, even though a name like CurrentResultHandler sounds very broad. It also means more global variables, which is a thing of which the PostgreSQL codebase already has a deplorable oversupply. quiet_handler() and noop_handler() aren't used anywhere either, AFAICS. I wonder if it would be better to pass in callbacks rather than relying on global variables. e.g.: typedef void (*fatal_error_callback)(const char *fmt,...) pg_attribute_printf(1, 2) pg_attribute_noreturn(); Then you could have a few helper functions that take an argument of type fatal_error_callback and throw the right fatal error for (a) wrong PQresultStatus() and (b) result is not one row. Do you need any other cases? exiting_handler() seems to think that the caller might want to allow any number of tuples, or any positive number, or any particular cout, but I'm not sure if all of those cases are really needed. This stuff is finnicky and hard to get right. You don't really want to create a situation where the same code keeps getting duplicated, or the behavior's just a little bit inconsistent everywhere, but it also isn't great to build layers upon layers of abstraction around something like ExecuteSqlQuery which is, in the end, a four-line function. I don't think there's any problem with something like pg_dump having it's own function to execute-a-query-or-die. Maybe that function ends up doing something like TheGenericFunctionToExecuteOrDie(my_die_fn, the_query), or maybe pg_dump can just open-code it but have a my_die_fn to pass down to the glob-expansion stuff, or, well, I don't know. -- Robert Haas EDB: http://www.enterprisedb.com
Re: new heapcheck contrib module
On Fri, Jan 8, 2021 at 6:33 AM Mark Dilger wrote: > The attached patches, v31, are mostly the same, but with "getopt_long.h" > included from pg_amcheck.c per Thomas's review, and a .gitignore file added > in contrib/pg_amcheck/ I couple more little things from Windows CI: C:\projects\postgresql\src\include\fe_utils/option_utils.h(19): fatal error C1083: Cannot open include file: 'libpq-fe.h': No such file or directory [C:\projects\postgresql\pg_amcheck.vcxproj] Does contrib/amcheck/Makefile need to say "SHLIB_PREREQS = submake-libpq" like other contrib modules that use libpq? pg_backup_utils.obj : error LNK2001: unresolved external symbol exit_nicely [C:\projects\postgresql\pg_dump.vcxproj] I think this is probably because additions to src/fe_utils/Makefile's OBJS list need to be manually replicated in src/tools/msvc/Mkvcbuild.pm's @pgfeutilsfiles list. (If I'm right about that, perhaps it needs a comment to remind us Unix hackers of that, or perhaps it should be automated...)
Re: new heapcheck contrib module
> On Nov 19, 2020, at 11:47 AM, Peter Geoghegan wrote: > > On Thu, Nov 19, 2020 at 9:06 AM Robert Haas wrote: >> I'm also not sure if these descriptions are clear enough, but it may >> also be hard to do a good job in a brief space. Still, comparing this >> to the documentation of heapallindexed makes me rather nervous. This >> is only trying to verify that the index contains all the tuples in the >> heap, not that the values in the heap and index tuples actually match. > > That's a good point. As things stand, heapallindexed verification does > not notice when there are extra index tuples in the index that are in > some way inconsistent with the heap. Hopefully this isn't too much of > a problem in practice because the presence of extra spurious tuples > gets detected by the index structure verification process. But in > general that might not happen. > > Ideally heapallindex verification would verify 1:1 correspondence. It > doesn't do that right now, but it could. > > This could work by having two bloom filters -- one for the heap, > another for the index. The implementation would look for the absence > of index tuples that should be in the index initially, just like > today. But at the end it would modify the index bloom filter by &= it > with the complement of the heap bloom filter. If any bits are left set > in the index bloom filter, we go back through the index once more and > locate index tuples that have at least some matching bits in the index > bloom filter (we cannot expect all of the bits from each of the hash > functions used by the bloom filter to still be matches). > > From here we can do some kind of lookup for maybe-not-matching index > tuples that we locate. Make sure that they point to an LP_DEAD line > item in the heap or something. Make sure that they have the same > values as the heap tuple if they're still retrievable (i.e. if we > haven't pruned the heap tuple away already). This approach sounds very good to me, but beyond the scope of what I'm planning for this release cycle. >> This to me seems too conservative. The result is that by default we >> check only tables, not indexes. I don't think that's going to be what >> users want. I don't know whether they want the heapallindexed or >> rootdescend behaviors for index checks, but I think they want their >> indexes checked. Happy to hear opinions from actual users on what they >> want; this is just me guessing that you've guessed wrong. :-) > > My thoughts on these two options: > > * I don't think that users will ever want rootdescend verification. > > That option exists now because I wanted to have something that relied > on the uniqueness property of B-Tree indexes following the Postgres 12 > work. I didn't add retail index tuple deletion, so it seemed like a > good idea to have something that makes the same assumptions that it > would have to make. To validate the design. > > Another factor is that Alexander Korotkov made the basic > bt_index_parent_check() tests a lot better for Postgres 13. This > undermined the practical argument for using rootdescend verification. The latest version of the patch has rootdescend off by default, but a switch to turn it on. The documentation for that switch in doc/src/sgml/pgamcheck.sgml summarizes your comments: + This form of verification was originally written to help in the + development of btree index features. It may be of limited or even of no + use in helping detect the kinds of corruption that occur in practice. + In any event, it is known to be a rather expensive check to perform. For my own self, I don't care if rootdescend is an option in pg_amcheck. You and Robert expressed somewhat different opinions, and I tried to split the difference. I'm happy to go a different direction if that's what the consensus is. > Finally, note that bt_index_parent_check() was always supposed to be > something that was to be used only when you already knew that you had > big problems, and wanted absolutely thorough verification without > regard for the costs. This isn't the common case at all. It would be > reasonable to not expose anything from bt_index_parent_check() at all, > or to give it much less prominence. Not really sure of what the right > balance is here myself, so I'm not insisting on anything. Just telling > you what I know about it. This still needs work. Currently, there is a switch to turn off index checking, with the checks on by default. But there is no switch controlling which kind of check is performed (bt_index_check vs. bt_index_parent_check). Making matters more complicated, selecting both rootdescend and bt_index_check wouldn't make sense, as there is no rootdescend option on that function. So users would need multiple flags to turn on various options, with some flag combinations drawing an error about the flags not being mutually compatible. That's doable, but people may not like that interface. > * heapallind
Re: new heapcheck contrib module
On Tue, Oct 27, 2020 at 5:12 AM Mark Dilger wrote: > The v20 patches 0002, 0003, and 0005 still apply cleanly, but 0004 required a > rebase. (0001 was already committed last week.) > > Here is a rebased set of 4 patches, numbered 0002..0005 to be consistent with > the previous naming. There are no substantial changes. Hi Mark, The command line stuff fails to build on Windows[1]. I think it's just missing #include "getopt_long.h" (see contrib/vacuumlo/vacuumlo.c). [1] https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.123328
Re: new heapcheck contrib module
On Thu, Nov 19, 2020 at 1:50 PM Mark Dilger wrote: > It makes sense to me to have a "don't run through minefields" option, and a > "go ahead, run through minefields" option for pg_amcheck, given that users in > differing situations will have differing business consequences to bringing > down the server in question. This kind of framing suggests zero-risk bias to me: https://en.wikipedia.org/wiki/Zero-risk_bias It's simply not helpful to think of the risks as "running through a minefield" versus "not running through a minefield". I also dislike this framing because in reality nobody runs through a minefield, unless maybe it's a battlefield and the alternative is probably even worse. Risks are not discrete -- they're continuous. And they're situational. I accept that there are certain reasonable gradations in the degree to which a segfault is bad, even in contexts in which pg_amcheck runs into actual serious problems. And as Robert points out, experience suggests that on average people care about availability the most when push comes to shove (though I hasten to add that that's not the same thing as considering a once-off segfault to be the greater evil here). Even still, I firmly believe that it's a mistake to assign *infinite* weight to not having a segfault. That is likely to have certain unintended consequences that could be even worse than a segfault, such as not detecting pernicious corruption over many months because our can't-segfault version of core functionality fails to have the same bugs as the actual core functionality (and thus fails to detect a problem in the core functionality). The problem with giving infinite weight to any one bad outcome is that it makes it impossible to draw reasonable distinctions between it and some other extreme bad outcome. For example, I would really not like to get infected with Covid-19. But I also think that it would be much worse to get infected with Ebola. It follows that Covid-19 must not be infinitely bad, because if it is then I can't make this useful distinction -- which might actually matter. If somebody hears me say this, and takes it as evidence of my lackadaisical attitude towards Covid-19, I can live with that. I care about avoiding criticism as much as the next person, but I refuse to prioritize it over all other things. > I doubt other backend hardening is any more likely to get committed. I suspect you're right about that. Because of the risks of causing real harm to users. The backend code is obviously *not* written with the assumption that data cannot be corrupt. There are lots of specific ways in which it is hardened (e.g., there are many defensive "can't happen" elog() statements). I really don't know why you insist on this black and white framing. -- Peter Geoghegan
Re: new heapcheck contrib module
> On Nov 19, 2020, at 11:47 AM, Peter Geoghegan wrote: > >> I think in general you're worrying too much about the possibility of >> this tool causing backend crashes. I think it's good that you wrote >> the heapcheck code in a way that's hardened against that, and I think >> we should try to harden other things as time permits. But I don't >> think that the remote possibility of a crash due to the lack of such >> hardening should dictate the design behavior of this tool. If the >> crash possibilities are not remote, then I think the solution is to >> fix them, rather than cutting out important checks. > > I couldn't agree more. Owing to how much run-time overhead it would entail, much of the backend code has not been, and probably will not be, hardened against corruption. The amcheck code uses backend code for accessing heaps and indexes. Only some of those uses can be preceded with sufficient safety checks to avoid stepping on landmines. It makes sense to me to have a "don't run through minefields" option, and a "go ahead, run through minefields" option for pg_amcheck, given that users in differing situations will have differing business consequences to bringing down the server in question. As an example that we've already looked at, checking the status of an xid against clog is a dangerous thing to do. I wrote a patch to make it safer to query clog (0003) and a patch for pg_amcheck to use the safer interface (0004) and it looks unlikely either of those will ever be committed. I doubt other backend hardening is any more likely to get committed. It doesn't follow that if crash possibilities are not remote that we should therefore harden the backend. The performance considerations of the backend are not well aligned with the safety considerations of this tool. The backend code is written with the assumption of non-corrupt data, and this tool with the assumption of corrupt data, or at least a fair probability of corrupt data. I don't see how any one-hardening-fits-all will ever work. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
On Thu, Nov 19, 2020 at 12:06 PM Robert Haas wrote: > I originally intended to review the docs and regression tests in the > same email as the patch itself, but this email has gotten rather long > and taken rather longer to get together than I had hoped, so I'm going > to stop here for now and come back to that stuff. Broad question: Does pg_amcheck belong in src/bin, or in contrib? You have it in the latter place, but I'm not sure if that's the right idea. I'm not saying it *isn't* the right idea, but I'm just wondering what other people think. Now, on to the docs: + Currently, this requires execute privileges on 's + bt_index_parent_check and verify_heapam This makes me wonder why there isn't an option to call bt_index_check() rather than bt_index_parent_check(). It doesn't seem to be standard practice to include the entire output of the command's --help option in the documentation. That means as soon as anybody changes anything they've got to change the documentation too. I don't see anything like that in the pages for psql or vacuumlo or pg_verifybackup. It also doesn't seem like a useful thing to do. Anyone who is reading the documentation probably is in a position to try --help if they wish; they don't need that duplicated here. Looking at those other pages, what seems to be typical for an SGML is to list all the options and give a short paragraph on what each one does. What you have instead is a narrative description. I recommend looking over the reference page for one of those other command-line utilities and adapting it to this case. Back to the the code: +static const char * +get_index_relkind_quals(void) +{ + if (!index_relkind_quals) + index_relkind_quals = psprintf("'%c'", RELKIND_INDEX); + return index_relkind_quals; +} I feel like there ought to be a way to work this out at compile time rather than leaving it to runtime. I think that replacing the function body with "return CppAsString2(RELKIND_INDEX);" would have the same result, and once you do that you don't really need the function any more. This is arguably cheating a bit: RELKIND_INDEX is defined as 'i' and CppAsString2() turns that into a string containing those three characters. That happens to work because what we want to do is quote this for use in SQL, and SQL happens to use single quotes for literals just like C does for individual characters. It would be mor elegant to figure out a way to interpolate just the character into C string, but I don't know of a macro trick that will do that. I think one could write char *something = { '\'', RELKIND_INDEX, '\'', '\0' } but that would be pretty darn awkward for the table case where you want an ANY with three relkinds in there. But maybe you could get around that by changing the query slightly. Suppose instead of relkind = BLAH, you write POSITION(relkind IN '%s') > 0. Then you could just have the caller pass either: char *index_relkinds = { RELKIND_INDEX, '\0' }; -or- char *table_relkinds = { RELKIND_RELATION, RELKIND_MATVIEW, RELKIND_TOASTVALUE, '\0' }; The patch actually has RELKIND_PARTITIONED_TABLE there rather than RELKIND_RELATION, but that seems wrong to me, because partitioned tables don't have storage, and toast tables do. And if we're going to include RELKIND_PARTITIONED_TABLE for some reason, then why not RELKIND_PARTITIONED_INDEX for the index case? On the tests: I think 003_check.pl needs to stop and restart the table between populating the tables and corrupting them. Otherwise, how do we know that the subsequent checks are going to actually see the corruption rather than something already cached in memory? There are some philosophical questions to consider too, about how these tests are written and what our philosophy ought to be here, but I am again going to push that off to a future email. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
On Thu, Nov 19, 2020 at 2:48 PM Peter Geoghegan wrote: > Ideally heapallindex verification would verify 1:1 correspondence. It > doesn't do that right now, but it could. Well, that might be a cool new mode, but it doesn't necessarily have to supplant the thing we have now. The problem immediately before us is just making sure that the user can understand what we will and won't be checking. > My thoughts on these two options: > > * I don't think that users will ever want rootdescend verification. That seems too absolute. I think it's fine to say, we don't think that users will want this, so let's not do it by default. But if it's so useless as to not be worth a command-line option, then it was a mistake to put it into contrib at all. Let's expose all the things we have, and try to set the defaults according to what we expect to be most useful. > * heapallindexed is kind of expensive, but valuable. But the extra > check is probably less likely to help on the second or subsequent > index on a table. > > It might be worth considering an option that only uses it with only > one index: Preferably the primary key index, failing that some unique > index, and failing that some other index. This seems a bit too clever for me. I would prefer a simpler schema, where we choose the default we think most people will want and use it for everything -- and allow the user to override. > Even if your user is just average, they still have one major advantage > over the architects of pg_amcheck: actual knowledge of the problem in > front of them. Quite so. > I think that you need to have a kind of epistemic modesty with this > stuff. Okay, we guarantee that the backend won't crash when certain > amcheck functions are run, based on these caveats. But don't we always > guarantee something like that? And are the specific caveats actually > that different in each case, when you get right down to it? A > guarantee does not exist in a vacuum. It always has implicit > limitations. For example, any guarantee implicitly comes with the > caveat "unless I, the guarantor, am wrong". Yep. > I'm also suspicious of guarantees like this for less philosophical > reasons. It seems to me like it solves our problem rather than the > user's problem. Having data that is so badly corrupt that it's > difficult to avoid segfaults when we perform some kind of standard > transformations on it is an appalling state of affairs for the user. > The segfault itself is very much not the point at all. I mostly agree with everything you say here, but I think we need to be careful not to accept the position that seg faults are no big deal. Consider the following users, all of whom start with a database that they believe to be non-corrupt: Alice runs pg_amcheck. It says that nothing is wrong, and that happens to be true. Bob runs pg_amcheck. It says that there are problems, and there are. Carol runs pg_amcheck. It says that nothing is wrong, but in fact something is wrong. Dan runs pg_amcheck. It says that there are problems, but in fact there are none. Erin runs pg_amcheck. The server crashes. Alice and Bob are clearly in the best shape here, but Carol and Dan arguably haven't been harmed very much. Sure, Carol enjoys a false sense of security, but since she otherwise believed things were OK, the impact of whatever problems exist is evidently not that bad. Dan is worrying over nothing, but the damage is only to his psyche, not his database; we can hope he'll eventually sort out what has happened without grave consequences. Erin, on the other hand, is very possibly in a lot of trouble with her boss and her coworkers. She had what seemed to be a healthy database, and from their perspective, she shot it in the head without any real cause. It will be faint consolation to her and her coworkers that the database was corrupt all along: until she ran the %$! tool, they did not have a problem that affected the ability of their business to generate revenue. Now they had an outage, and that does. While I obviously haven't seen this exact scenario play out for a customer, because pg_amcheck is not committed, I have seen similar scenarios over and over. It's REALLY bad when the database goes down. Then the application goes down, and then it gets really ugly. As long as the database was just returning wrong answers or eating data, nobody's boss really cared that much, but now that it's down, they care A LOT. This is of course not to say that nobody cares about the accuracy of results from the database: many people care a lot, and that's why it's good to have tools like this. But we should not underestimate the horror caused by a crash. A working database, even with some wrong data in it, is a problem people would probably like to get fixed. A down database is an emergency. So I think we should actually get a lot more serious about ensuring that corrupt data on disk doesn't cause crashes, even for regular SELECT statements. I don't think we can take an arbitrary
Re: new heapcheck contrib module
On Thu, Nov 19, 2020 at 9:06 AM Robert Haas wrote: > I'm also not sure if these descriptions are clear enough, but it may > also be hard to do a good job in a brief space. Still, comparing this > to the documentation of heapallindexed makes me rather nervous. This > is only trying to verify that the index contains all the tuples in the > heap, not that the values in the heap and index tuples actually match. That's a good point. As things stand, heapallindexed verification does not notice when there are extra index tuples in the index that are in some way inconsistent with the heap. Hopefully this isn't too much of a problem in practice because the presence of extra spurious tuples gets detected by the index structure verification process. But in general that might not happen. Ideally heapallindex verification would verify 1:1 correspondence. It doesn't do that right now, but it could. This could work by having two bloom filters -- one for the heap, another for the index. The implementation would look for the absence of index tuples that should be in the index initially, just like today. But at the end it would modify the index bloom filter by &= it with the complement of the heap bloom filter. If any bits are left set in the index bloom filter, we go back through the index once more and locate index tuples that have at least some matching bits in the index bloom filter (we cannot expect all of the bits from each of the hash functions used by the bloom filter to still be matches). >From here we can do some kind of lookup for maybe-not-matching index tuples that we locate. Make sure that they point to an LP_DEAD line item in the heap or something. Make sure that they have the same values as the heap tuple if they're still retrievable (i.e. if we haven't pruned the heap tuple away already). > This to me seems too conservative. The result is that by default we > check only tables, not indexes. I don't think that's going to be what > users want. I don't know whether they want the heapallindexed or > rootdescend behaviors for index checks, but I think they want their > indexes checked. Happy to hear opinions from actual users on what they > want; this is just me guessing that you've guessed wrong. :-) My thoughts on these two options: * I don't think that users will ever want rootdescend verification. That option exists now because I wanted to have something that relied on the uniqueness property of B-Tree indexes following the Postgres 12 work. I didn't add retail index tuple deletion, so it seemed like a good idea to have something that makes the same assumptions that it would have to make. To validate the design. Another factor is that Alexander Korotkov made the basic bt_index_parent_check() tests a lot better for Postgres 13. This undermined the practical argument for using rootdescend verification. Finally, note that bt_index_parent_check() was always supposed to be something that was to be used only when you already knew that you had big problems, and wanted absolutely thorough verification without regard for the costs. This isn't the common case at all. It would be reasonable to not expose anything from bt_index_parent_check() at all, or to give it much less prominence. Not really sure of what the right balance is here myself, so I'm not insisting on anything. Just telling you what I know about it. * heapallindexed is kind of expensive, but valuable. But the extra check is probably less likely to help on the second or subsequent index on a table. It might be worth considering an option that only uses it with only one index: Preferably the primary key index, failing that some unique index, and failing that some other index. > This seems pretty lame to me. Even if the btree checker can't tolerate > corruption to the extent that the heap checker does, seg faulting > because of a missing file seems like a bug that we should just fix > (and probably back-patch). I'm not very convinced by the decision to > override the user's decision about heapallindexed either. I strongly agree. > Maybe I lack > imagination, but that seems pretty arbitrary. Suppose there's a giant > index which is missing entries for 5 million heap tuples and also > there's 1 entry in the table which has an xmin that is less than the > pg_clas.relfrozenxid value by 1. You are proposing that because I have > the latter problem I don't want you to check for the former one. But > I, John Q. Smartuser, do not want you to second-guess what I told you > on the command line that I wanted. :-) Even if your user is just average, they still have one major advantage over the architects of pg_amcheck: actual knowledge of the problem in front of them. > I think in general you're worrying too much about the possibility of > this tool causing backend crashes. I think it's good that you wrote > the heapcheck code in a way that's hardened against that, and I think > we should try to harden other things as time permits. But I don't > think t
Re: new heapcheck contrib module
On Mon, Oct 26, 2020 at 12:12 PM Mark Dilger wrote: > The v20 patches 0002, 0003, and 0005 still apply cleanly, but 0004 required a > rebase. (0001 was already committed last week.) > > Here is a rebased set of 4 patches, numbered 0002..0005 to be consistent with > the previous naming. There are no substantial changes. Here's a review of 0002. I basically like the direction this is going but I guess nobody will be surprised that there are some things in here that I think could be improved. +const char *usage_text[] = { + "pg_amcheck is the PostgreSQL command line frontend for the amcheck database corruption checker.", + "", This looks like a novel approach to the problem of printing out the usage() information, and I think that it's inferior to the technique used elsewhere of just having a bunch of printf() statements, because unless I misunderstand, it doesn't permit localization. + " -b, --startblock begin checking table(s) at the given starting block number", + " -e, --endblock check table(s) only up to the given ending block number", + " -B, --toast-startblock begin checking toast table(s) at the given starting block", + " -E, --toast-endblock check toast table(s) only up to the given ending block", I am not very convinced by this. What's the use case? If you're just checking a single table, you might want to specify a start and end block, but then you don't need separate options for the TOAST and non-TOAST cases, do you? If I want to check pg_statistic, I'll say pg_amcheck -t pg_catalog.pg_statistic. If I want to check the TOAST table for pg_statistic, I'll say pg_amcheck -t pg_toast.pg_toast_2619. In either case, if I want to check just the first three blocks, I can add -b 0 -e 2. + " -f, --skip-all-frozendo NOT check blocks marked as all frozen", + " -v, --skip-all-visible do NOT check blocks marked as all visible", I think this is using up too many one character option names for too little benefit on things that are too closely related. How about, -s, --skip=all-frozen|all-visible|none? And then -v could mean verbose, which could trigger things like printing all the queries sent to the server, setting PQERRORS_VERBOSE, etc. + " -x, --check-indexes check btree indexes associated with tables being checked", + " -X, --skip-indexes do NOT check any btree indexes", + " -i, --index=PATTERN check the specified index(es) only", + " -I, --exclude-index=PATTERN do NOT check the specified index(es)", This is a lotta controls for something that has gotta have some default. Either the default is everything, in which case I don't see why I need -x, or it's nothing, in which case I don't see why I need -X. + " -c, --check-corrupt check indexes even if their associated table is corrupt", + " -C, --skip-corrupt do NOT check indexes if their associated table is corrupt", Ditto. (I think the default be to check corrupt, and there can be an option to skip it.) + " -a, --heapallindexed check index tuples against the table tuples", + " -A, --no-heapallindexed do NOT check index tuples against the table tuples", Ditto. (Not sure what the default should be, though.) + " -r, --rootdescendsearch from the root page for each index tuple", + " -R, --no-rootdescend do NOT search from the root page for each index tuple", Ditto. (Again, not sure about the default.) I'm also not sure if these descriptions are clear enough, but it may also be hard to do a good job in a brief space. Still, comparing this to the documentation of heapallindexed makes me rather nervous. This is only trying to verify that the index contains all the tuples in the heap, not that the values in the heap and index tuples actually match. +typedef struct +AmCheckSettings +{ + char *dbname; + char *host; + char *port; + char *username; +} ConnectOptions; Making the struct name different from the type name seems not good, and the struct name also shouldn't be on a separate line. +typedef enum trivalue +{ + TRI_DEFAULT, + TRI_NO, + TRI_YES +} trivalue; Ugh. It's not this patch's fault, but we really oughta move this to someplace more centralized. +typedef struct ... +} AmCheckSettings; I'm not sure I consider all of these things settings, "db" in particular. But maybe that's nitpicking. +static void expand_schema_name_patterns(const SimpleStringList *patterns, + const SimpleOidList *exclude_oids, + SimpleOidList *oids + bool strict_names); This is copied from pg_dump, along with I think at least one other function from nearby. Unlike the trivalue case above, this would be the first duplication of this logic. Can we push this stuff into pgcommon, perhaps? + /* +* Default beha
Re: new heapcheck contrib module
> On Oct 26, 2020, at 9:12 AM, Tom Lane wrote: > > Robert Haas writes: >> On Wed, Oct 21, 2020 at 11:45 PM Mark Dilger >> wrote: >>> Done that way in the attached, which also include Robert's changes from v19 >>> he posted earlier today. > >> Committed. Let's see what the buildfarm thinks. > > Another thing that the buildfarm is pointing out is > > [WARN] FOUserAgent - The contents of fo:block line 2 exceed the available > area in the inline-progression direction by more than 50 points. (See > position 148863:380) > > This is coming from the sample output for verify_heapam(), which is too > wide to fit in even a normal-size browser window, let alone A4 PDF. > > While we could perhaps hack it up to allow more line breaks, or see > if \x formatting helps, my own suggestion would be to just nuke the > sample output altogether. Ok. > It doesn't look like it is any sort of > representative real output, It is not. It came from artificially created corruption in the regression tests. I may even have manually edited that, though I don't recall. > and it is not useful enough to be worth > spending time to patch up. Ok. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
Robert Haas writes: > On Wed, Oct 21, 2020 at 11:45 PM Mark Dilger > wrote: >> Done that way in the attached, which also include Robert's changes from v19 >> he posted earlier today. > Committed. Let's see what the buildfarm thinks. Another thing that the buildfarm is pointing out is [WARN] FOUserAgent - The contents of fo:block line 2 exceed the available area in the inline-progression direction by more than 50 points. (See position 148863:380) This is coming from the sample output for verify_heapam(), which is too wide to fit in even a normal-size browser window, let alone A4 PDF. While we could perhaps hack it up to allow more line breaks, or see if \x formatting helps, my own suggestion would be to just nuke the sample output altogether. It doesn't look like it is any sort of representative real output, and it is not useful enough to be worth spending time to patch up. regards, tom lane
Re: new heapcheck contrib module
Hi, On October 26, 2020 7:13:15 AM PDT, Tom Lane wrote: >Robert Haas writes: >> On Mon, Oct 26, 2020 at 9:56 AM Mark Dilger >> wrote: hornet is still unhappy even after 321633e17b07968e68ca5341429e2c8bbf15c331? > >>> That appears to be a failed test for pg_surgery rather than for >amcheck. Or am I reading the log wrong? > >> Oh, yeah, you're right. I don't know why it just failed now, though: >> there are a bunch of successful runs preceding it. But I guess it's >> unrelated to this thread. > >pg_surgery's been unstable since it went in. I believe Andres is >working on a fix. I posted one a while ago - was planning to push a cleaned up version soon if nobody comments in the near future. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: new heapcheck contrib module
Robert Haas writes: > On Mon, Oct 26, 2020 at 9:56 AM Mark Dilger > wrote: >>> hornet is still unhappy even after >>> 321633e17b07968e68ca5341429e2c8bbf15c331? >> That appears to be a failed test for pg_surgery rather than for amcheck. Or >> am I reading the log wrong? > Oh, yeah, you're right. I don't know why it just failed now, though: > there are a bunch of successful runs preceding it. But I guess it's > unrelated to this thread. pg_surgery's been unstable since it went in. I believe Andres is working on a fix. regards, tom lane
Re: new heapcheck contrib module
> On Oct 26, 2020, at 7:00 AM, Robert Haas wrote: > > On Mon, Oct 26, 2020 at 9:56 AM Mark Dilger > wrote: >> Much of the test in 0002 could be ported to work without committing the rest >> of 0002, if the pg_amcheck command line utiilty is not wanted. > > How much consensus do we think we have around 0002 at this point? I > think I remember a vote in favor and no votes against, but I haven't > been paying a whole lot of attention. My sense over the course of the thread is that people were very much in favor of having heap checking functionality, but quite vague on whether they wanted the command line interface. I think the interface is useful, but I'd rather hear from others on this list whether it is useful enough to justify maintaining it. As the author of it, I'm biased. Hopefully others with a more objective view of the matter will read this and vote? I don't recall patches 0003 through 0005 getting any votes. 0003 and 0004, which create and use a non-throwing interface to clog, were written in response to Andrey's request, so I'm guessing that's kind of a vote in favor. 0005 was factored out of of 0001 in response to a lack of agreement about whether verify_heapam should have acl checks. You seemed in favor, and Peter against, but I don't think we heard other opinions. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
On Mon, Oct 26, 2020 at 9:56 AM Mark Dilger wrote: > Much of the test in 0002 could be ported to work without committing the rest > of 0002, if the pg_amcheck command line utiilty is not wanted. How much consensus do we think we have around 0002 at this point? I think I remember a vote in favor and no votes against, but I haven't been paying a whole lot of attention. > > Thanks for committing (and adjusting) the patches for the existing > > buildfarm failures. If I understand the buildfarm results correctly, > > hornet is still unhappy even after > > 321633e17b07968e68ca5341429e2c8bbf15c331? > > That appears to be a failed test for pg_surgery rather than for amcheck. Or > am I reading the log wrong? Oh, yeah, you're right. I don't know why it just failed now, though: there are a bunch of successful runs preceding it. But I guess it's unrelated to this thread. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
> On Oct 26, 2020, at 6:37 AM, Robert Haas wrote: > > On Fri, Oct 23, 2020 at 2:04 PM Tom Lane wrote: >> Seems to work, so I pushed it (after some compulsive fooling >> about with whitespace and perltidy-ing). It appears to me that >> the code coverage for verify_heapam.c is not very good though, >> only circa 50%. Do we care to expend more effort on that? > > There are two competing goods here. On the one hand, more test > coverage is better than less. On the other hand, finicky tests that > have platform-dependent results or fail for strange reasons not > indicative of actual problems with the code are often judged not to be > worth the trouble. An early version of this patch set had a very > extensive chunk of Perl code in it that actually understood the page > layout and, if we adopt something like that, it would probably be > easier to test a whole bunch of scenarios. The downside is that it was > a lot of code that basically duplicated a lot of backend logic in > Perl, and I was (and am) afraid that people will complain about the > amount of code and/or the difficulty of maintaining it. On the other > hand, having all that code might allow better testing not only of this > particular patch but also other scenarios involving corrupted pages, > so maybe it's wrong to view all that code as a burden that we have to > carry specifically to test this; or, alternatively, maybe it's worth > carrying even if we only use it for this. On the third hand, as Mark > points out, if we get 0002 committed, that will help somewhat with > test coverage even if we do nothing else. Much of the test in 0002 could be ported to work without committing the rest of 0002, if the pg_amcheck command line utiilty is not wanted. > > Thanks for committing (and adjusting) the patches for the existing > buildfarm failures. If I understand the buildfarm results correctly, > hornet is still unhappy even after > 321633e17b07968e68ca5341429e2c8bbf15c331? That appears to be a failed test for pg_surgery rather than for amcheck. Or am I reading the log wrong? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
On Fri, Oct 23, 2020 at 2:04 PM Tom Lane wrote: > Seems to work, so I pushed it (after some compulsive fooling > about with whitespace and perltidy-ing). It appears to me that > the code coverage for verify_heapam.c is not very good though, > only circa 50%. Do we care to expend more effort on that? There are two competing goods here. On the one hand, more test coverage is better than less. On the other hand, finicky tests that have platform-dependent results or fail for strange reasons not indicative of actual problems with the code are often judged not to be worth the trouble. An early version of this patch set had a very extensive chunk of Perl code in it that actually understood the page layout and, if we adopt something like that, it would probably be easier to test a whole bunch of scenarios. The downside is that it was a lot of code that basically duplicated a lot of backend logic in Perl, and I was (and am) afraid that people will complain about the amount of code and/or the difficulty of maintaining it. On the other hand, having all that code might allow better testing not only of this particular patch but also other scenarios involving corrupted pages, so maybe it's wrong to view all that code as a burden that we have to carry specifically to test this; or, alternatively, maybe it's worth carrying even if we only use it for this. On the third hand, as Mark points out, if we get 0002 committed, that will help somewhat with test coverage even if we do nothing else. Thanks for committing (and adjusting) the patches for the existing buildfarm failures. If I understand the buildfarm results correctly, hornet is still unhappy even after 321633e17b07968e68ca5341429e2c8bbf15c331? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
> On Oct 23, 2020, at 4:12 PM, Tom Lane wrote: > > Mark Dilger writes: >> You certainly appear to be right about that. I've added the extra checks, >> and extended the regression test to include them. Patch attached. > > Pushed with some more fooling about. The "bit reversal" idea is not > a sufficient guide to picking values that will hit all the code checks. > For instance, I was seeing out-of-range warnings on one endianness and > not the other because on the other one the maxalign check rejected the > value first. I ended up manually tweaking the corruption patterns > until they hit all the tests on both endiannesses. Pretty much the > opposite of black-box testing, but it's not like our notions of line > pointer layout are going to change anytime soon. > > I made some logic rearrangements in the C code, too. Thanks Tom! And Peter, your comment earlier save me some time. Thanks to you, also! — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
Mark Dilger writes: > You certainly appear to be right about that. I've added the extra checks, > and extended the regression test to include them. Patch attached. Pushed with some more fooling about. The "bit reversal" idea is not a sufficient guide to picking values that will hit all the code checks. For instance, I was seeing out-of-range warnings on one endianness and not the other because on the other one the maxalign check rejected the value first. I ended up manually tweaking the corruption patterns until they hit all the tests on both endiannesses. Pretty much the opposite of black-box testing, but it's not like our notions of line pointer layout are going to change anytime soon. I made some logic rearrangements in the C code, too. regards, tom lane
Re: new heapcheck contrib module
Mark Dilger writes: >> On Oct 23, 2020, at 11:51 AM, Tom Lane wrote: >> I do not have 64-bit big-endian hardware to play with unfortunately. >> But what I suspect is happening here is less about endianness and >> more about alignment pickiness; or maybe we were unlucky enough to >> index off the end of the shmem segment. > You certainly appear to be right about that. I've added the extra checks, > and extended the regression test to include them. Patch attached. Meanwhile, I've replicated the SIGBUS problem on gaur's host, so that's definitely what's happening. (Although PPC is also alignment-picky on the hardware level, I believe that both macOS and Linux try to mask that by having kernel trap handlers execute unaligned accesses, leaving only a nasty performance loss behind. That's why I failed to see this effect when checking your previous patch on an old Apple box. We likely won't see it in the buildfarm either, unless maybe on Noah's AIX menagerie.) I'll check this patch on gaur and push it if it's clean. regards, tom lane
Re: new heapcheck contrib module
> On Oct 23, 2020, at 11:51 AM, Tom Lane wrote: > > Hmm, we're not out of the woods yet: thorntail is even less happy > than before. > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2020-10-23%2018%3A08%3A11 > > I do not have 64-bit big-endian hardware to play with unfortunately. > But what I suspect is happening here is less about endianness and > more about alignment pickiness; or maybe we were unlucky enough to > index off the end of the shmem segment. I see that verify_heapam > does this for non-redirect tuples: > >/* Set up context information about this next tuple */ >ctx.lp_len = ItemIdGetLength(ctx.itemid); >ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid); >ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr); > > with absolutely no thought for the possibility that lp_off is out of > range or not maxaligned. The checks for a sane lp_len seem to have > gone missing as well. You certainly appear to be right about that. I've added the extra checks, and extended the regression test to include them. Patch attached. v23-0001-Sanity-checking-line-pointers.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
On Fri, Oct 23, 2020 at 11:51 AM Tom Lane wrote: > /* Set up context information about this next tuple */ > ctx.lp_len = ItemIdGetLength(ctx.itemid); > ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid); > ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr); > > with absolutely no thought for the possibility that lp_off is out of > range or not maxaligned. The checks for a sane lp_len seem to have > gone missing as well. That is surprising. verify_nbtree.c has PageGetItemIdCareful() for this exact reason. -- Peter Geoghegan
Re: new heapcheck contrib module
Hmm, we're not out of the woods yet: thorntail is even less happy than before. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2020-10-23%2018%3A08%3A11 I do not have 64-bit big-endian hardware to play with unfortunately. But what I suspect is happening here is less about endianness and more about alignment pickiness; or maybe we were unlucky enough to index off the end of the shmem segment. I see that verify_heapam does this for non-redirect tuples: /* Set up context information about this next tuple */ ctx.lp_len = ItemIdGetLength(ctx.itemid); ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid); ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr); with absolutely no thought for the possibility that lp_off is out of range or not maxaligned. The checks for a sane lp_len seem to have gone missing as well. regards, tom lane
Re: new heapcheck contrib module
> On Oct 23, 2020, at 11:04 AM, Tom Lane wrote: > > I wrote: >> Mark Dilger writes: >>> The patch I *should* have attached last night this time: > >> Thanks, I'll do some big-endian testing with this. > > Seems to work, so I pushed it (after some compulsive fooling > about with whitespace and perltidy-ing). Thanks for all the help! > It appears to me that > the code coverage for verify_heapam.c is not very good though, > only circa 50%. Do we care to expend more effort on that? Part of the issue here is that I developed the heapcheck code as a sequence of patches, and there is much greater coverage in the tests in the 0002 patch, which hasn't been committed yet. (Nor do I know that it ever will be.) Over time, the patch set became: 0001 -- adds verify_heapam.c to contrib/amcheck, with basic test coverage 0002 -- adds pg_amcheck command line interface to contrib/pg_amcheck, with more extensive test coverage 0003 -- creates a non-throwing interface to clog 0004 -- uses the non-throwing clog interface from within verify_heapam 0005 -- adds some controversial acl checks to verify_heapam Your question doesn't have much to do with 3,4,5 above, but it definitely matters whether we're going to commit 0002. The test in that patch, in contrib/pg_amcheck/t/004_verify_heapam.pl, does quite a bit of bit twiddling of heap tuples and toast records and checks that the right corruption messages come back. Part of the reason I was trying to keep 0001's t/001_verify_heapam.pl test ignorant of the exact page layout information is that I already had this other test that covers that. So, should I port that test from (currently non-existant) contrib/pg_amcheck into contrib/amcheck, or should we wait to see if the 0002 patch is going to get committed? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
I wrote: > Mark Dilger writes: >> The patch I *should* have attached last night this time: > Thanks, I'll do some big-endian testing with this. Seems to work, so I pushed it (after some compulsive fooling about with whitespace and perltidy-ing). It appears to me that the code coverage for verify_heapam.c is not very good though, only circa 50%. Do we care to expend more effort on that? regards, tom lane
Re: new heapcheck contrib module
Mark Dilger writes: > The patch I *should* have attached last night this time: Thanks, I'll do some big-endian testing with this. regards, tom lane
Re: new heapcheck contrib module
> On Oct 22, 2020, at 9:21 PM, Mark Dilger wrote: > > > >> On Oct 22, 2020, at 7:01 PM, Tom Lane wrote: >> >> Mark Dilger writes: >>> Ahh, crud. It's because >>> syswrite($fh, '\x77\x77\x77\x77', 500) >>> is wrong twice. The 500 was wrong, but the string there isn't the bit >>> pattern we want -- it's just a string literal with backslashes and such. >>> It should have been double-quoted. >> >> Argh. So we really have, using same test except >> >> memcpy(&lp, "\\x77", sizeof(lp)); >> >> little endian: off = 785c, flags = 2, len = 1b9b >> big endian: off = 2e3c, flags = 0, len = 3737 >> >> which explains the apparent LP_DEAD result. >> >> I'm not particularly on board with your suggestion of "well, if it works >> sometimes then it's okay". Then we have no idea of what we really tested. >> >> regards, tom lane > > Ok, I've pruned it down to something you may like better. Instead of just > checking that *some* corruption occurs, it checks the returned corruption > against an expected regex, and if it fails to match, you should see in the > logs what you got vs. what you expected. > > It only corrupts the first two line pointers, the first one with 0x > and the second one with 0x, which are consciously chosen to be > bitwise reverses of each other and just strings of alternating bits rather > than anything that could have a more complicated interpretation. > > On my little-endian mac, the 0x value creates a line pointer which > redirects to an invalid offset 0x, which gets reported as decimal 30583 > in the corruption report, "line pointer redirection to item at offset 30583 > exceeds maximum offset 38". The test is indifferent to whether the > corruption it is looking for is reported relative to the first line pointer > or the second one, so if endian-ness matters, it may be the 0x that > results in that corruption message. I don't have a machine handy to test > that. It would be nice to determine the minimum amount of paranoia necessary > to make this portable and not commit the rest. Obviously, that should have said 0x and 0x. After writing the patch that way, I checked that the old value 0x also works on my mac, which it does, and checked that writing the line pointers starting at offset 24 rather than 32 works on my mac, which it does, and then went on to write this rather confused email and attached the patch with those changes, which all work (at least on my mac) but are potentially less portable than what I had before testing those changes. I apologize for any confusion my email from last night may have caused. The patch I *should* have attached last night this time: regress.patch.WIP.2 Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
> On Oct 22, 2020, at 7:01 PM, Tom Lane wrote: > > Mark Dilger writes: >> Ahh, crud. It's because >> syswrite($fh, '\x77\x77\x77\x77', 500) >> is wrong twice. The 500 was wrong, but the string there isn't the bit >> pattern we want -- it's just a string literal with backslashes and such. It >> should have been double-quoted. > > Argh. So we really have, using same test except > > memcpy(&lp, "\\x77", sizeof(lp)); > > little endian:off = 785c, flags = 2, len = 1b9b > big endian: off = 2e3c, flags = 0, len = 3737 > > which explains the apparent LP_DEAD result. > > I'm not particularly on board with your suggestion of "well, if it works > sometimes then it's okay". Then we have no idea of what we really tested. > > regards, tom lane Ok, I've pruned it down to something you may like better. Instead of just checking that *some* corruption occurs, it checks the returned corruption against an expected regex, and if it fails to match, you should see in the logs what you got vs. what you expected. It only corrupts the first two line pointers, the first one with 0x and the second one with 0x, which are consciously chosen to be bitwise reverses of each other and just strings of alternating bits rather than anything that could have a more complicated interpretation. On my little-endian mac, the 0x value creates a line pointer which redirects to an invalid offset 0x, which gets reported as decimal 30583 in the corruption report, "line pointer redirection to item at offset 30583 exceeds maximum offset 38". The test is indifferent to whether the corruption it is looking for is reported relative to the first line pointer or the second one, so if endian-ness matters, it may be the 0x that results in that corruption message. I don't have a machine handy to test that. It would be nice to determine the minimum amount of paranoia necessary to make this portable and not commit the rest. regress.patch.WIP Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
Mark Dilger writes: > Ahh, crud. It's because > syswrite($fh, '\x77\x77\x77\x77', 500) > is wrong twice. The 500 was wrong, but the string there isn't the bit > pattern we want -- it's just a string literal with backslashes and such. It > should have been double-quoted. Argh. So we really have, using same test except memcpy(&lp, "\\x77", sizeof(lp)); little endian: off = 785c, flags = 2, len = 1b9b big endian: off = 2e3c, flags = 0, len = 3737 which explains the apparent LP_DEAD result. I'm not particularly on board with your suggestion of "well, if it works sometimes then it's okay". Then we have no idea of what we really tested. regards, tom lane
Re: new heapcheck contrib module
> On Oct 22, 2020, at 6:50 PM, Mark Dilger wrote: > > > >> On Oct 22, 2020, at 6:46 PM, Tom Lane wrote: >> >> I wrote: >>> I get >>> off = , flags = 2, len = 3bbb >>> on a little-endian machine, and >>> off = 3bbb, flags = 2, len = >>> on big-endian. It'd be less symmetric if the bytes weren't >>> all the same ... >> >> ... but given that this is the test value we are using, why >> don't both endiannesses whine about a non-maxalign'd offset? >> The code really shouldn't even be trying to follow these >> redirects, because we risk SIGBUS on picky architectures. > > Ahh, crud. It's because > > syswrite($fh, '\x77\x77\x77\x77', 500) > > is wrong twice. The 500 was wrong, but the string there isn't the bit > pattern we want -- it's just a string literal with backslashes and such. It > should have been double-quoted. The reason this never came up in testing is what I was talking about elsewhere -- this test isn't designed to create *specific* corruptions. It's just supposed to corrupt the table in some random way. For whatever reasons I'm not too curious about, that string corrupts on little endian machines but not big endian machines. If we want to have a test that tailors very specific corruptions, I don't think the way to get there is by debugging this test. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
> On Oct 22, 2020, at 6:46 PM, Tom Lane wrote: > > I wrote: >> I get >> off = , flags = 2, len = 3bbb >> on a little-endian machine, and >> off = 3bbb, flags = 2, len = >> on big-endian. It'd be less symmetric if the bytes weren't >> all the same ... > > ... but given that this is the test value we are using, why > don't both endiannesses whine about a non-maxalign'd offset? > The code really shouldn't even be trying to follow these > redirects, because we risk SIGBUS on picky architectures. Ahh, crud. It's because syswrite($fh, '\x77\x77\x77\x77', 500) is wrong twice. The 500 was wrong, but the string there isn't the bit pattern we want -- it's just a string literal with backslashes and such. It should have been double-quoted. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
> On Oct 22, 2020, at 6:41 PM, Tom Lane wrote: > > I wrote: >> So now I think this is a REDIRECT on either architecture, but the >> offset and length fields have different values, causing the redirect >> pointer to point to different places. Maybe it happens to point >> at a DEAD tuple in the big-endian case. > > Just to make sure, I tried this test program: > > #include > #include > > typedef struct ItemIdData > { >unsignedlp_off:15, /* offset to tuple (from start of page) */ >lp_flags:2, /* state of line pointer, see below */ >lp_len:15; /* byte length of tuple */ > } ItemIdData; > > int main() > { >ItemIdData lp; > >memset(&lp, 0x77, sizeof(lp)); >printf("off = %x, flags = %x, len = %x\n", > lp.lp_off, lp.lp_flags, lp.lp_len); >return 0; > } > > I get > > off = , flags = 2, len = 3bbb > > on a little-endian machine, and > > off = 3bbb, flags = 2, len = > > on big-endian. It'd be less symmetric if the bytes weren't > all the same ... I think we're going in the wrong direction here. The idea behind this test was to have as little knowledge about the layout of pages as possible and still verify that damaging the pages would result in corruption reports. Of course, not all damage will result in corruption reports, because some damage looks legit. I think it was just luck (good or bad depending on your perspective) that the damage in the test as committed works on little-endian but not big-endian. I can embed this knowledge that you have researched into the test if you want me to, but my instinct is to go the other direction and have even less knowledge about pages in the test. That would work if instead of expecting corruption for every time the test writes the file, instead to have it just make sure that it gets corruption reports at least some of the times that it does so. That seems more maintainable long term. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
I wrote: > I get > off = , flags = 2, len = 3bbb > on a little-endian machine, and > off = 3bbb, flags = 2, len = > on big-endian. It'd be less symmetric if the bytes weren't > all the same ... ... but given that this is the test value we are using, why don't both endiannesses whine about a non-maxalign'd offset? The code really shouldn't even be trying to follow these redirects, because we risk SIGBUS on picky architectures. regards, tom lane
Re: new heapcheck contrib module
I wrote: > So now I think this is a REDIRECT on either architecture, but the > offset and length fields have different values, causing the redirect > pointer to point to different places. Maybe it happens to point > at a DEAD tuple in the big-endian case. Just to make sure, I tried this test program: #include #include typedef struct ItemIdData { unsignedlp_off:15, /* offset to tuple (from start of page) */ lp_flags:2, /* state of line pointer, see below */ lp_len:15; /* byte length of tuple */ } ItemIdData; int main() { ItemIdData lp; memset(&lp, 0x77, sizeof(lp)); printf("off = %x, flags = %x, len = %x\n", lp.lp_off, lp.lp_flags, lp.lp_len); return 0; } I get off = , flags = 2, len = 3bbb on a little-endian machine, and off = 3bbb, flags = 2, len = on big-endian. It'd be less symmetric if the bytes weren't all the same ... regards, tom lane
Re: new heapcheck contrib module
Mark Dilger writes: >> On Oct 22, 2020, at 2:06 PM, Tom Lane wrote: >> Oh, wait a second. ItemIdData has the flag bits in the middle: >> meaning that for that particular bit pattern, one endianness >> is going to see the flags as 01 (LP_NORMAL) and the other as 10 >> (LP_REDIRECT). > Well, the issue is that on big-endian machines it is not reporting any > corruption at all. Are you sure the difference will be LP_NORMAL vs > LP_REDIRECT? [ thinks a bit harder... ] Probably not. The byte/bit string looks the same either way, given that it's four repetitions of the same byte value. But which field is which will differ: we have either oooFFlll 01110111011101110111011101110111 or lllFFooo 01110111011101110111011101110111 So now I think this is a REDIRECT on either architecture, but the offset and length fields have different values, causing the redirect pointer to point to different places. Maybe it happens to point at a DEAD tuple in the big-endian case. regards, tom lane
Re: new heapcheck contrib module
On Thu, Oct 22, 2020 at 4:04 PM Mark Dilger wrote: > I think the compiler warning was about fxid not being set. The callers pass > NULL for status if they don't want status checked, so writing *status > unconditionally would be an error. Also, if the xid being checked is out of > bounds, we can't check the status of the xid in clog. Sorry, you're (partly) right. The new logic is a lot more clear that we never used that uninitialized. I'll remove the extra semi-colon and commit this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
On Thu, Oct 22, 2020 at 2:39 PM Mark Dilger wrote: > > This is great work. Thanks Mark and Robert. > > That's the first time I've laughed today. Having turned the build-farm red, > this is quite ironic feedback! Thanks all the same for the sentiment. Breaking the buildfarm is not a capital offense. Especially when it happens with patches that are in some sense low level and/or novel, and therefore inherently more likely to cause trouble. -- Peter Geoghegan
Re: new heapcheck contrib module
> On Oct 22, 2020, at 2:26 PM, Peter Geoghegan wrote: > > On Thu, Oct 22, 2020 at 5:51 AM Robert Haas wrote: >> Committed. Let's see what the buildfarm thinks. > > This is great work. Thanks Mark and Robert. That's the first time I've laughed today. Having turned the build-farm red, this is quite ironic feedback! Thanks all the same for the sentiment. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
On Thu, Oct 22, 2020 at 5:51 AM Robert Haas wrote: > Committed. Let's see what the buildfarm thinks. This is great work. Thanks Mark and Robert. -- Peter Geoghegan
Re: new heapcheck contrib module
Mark Dilger writes: > Yeah, I'm already looking at that. The logic in verify_heapam skips over > line pointers that are unused or dead, and the test is reporting zero > corruption (and complaining about that), so it's probably not going to help > to overwrite all the line pointers with this particular bit pattern any more > than to just overwrite the first one, as it would just skip them all. > I think the test should overwrite the line pointers with a variety of > different bit patterns, or one calculated to work on all platforms. I'll > have to write that up. What we need here is to produce the same test results on either endianness. So probably the thing to do is apply the equivalent of ntohl() to produce a string that looks right for either host endianness. As a separate matter, you'd want to test corruption producing any of the four flag bitpatterns, probably. It says here you can use Perl's pack/unpack functions to get the equivalent of ntohl(), but I've not troubled to work out how. regards, tom lane
Re: new heapcheck contrib module
> On Oct 22, 2020, at 2:06 PM, Tom Lane wrote: > > I wrote: >> Mark Dilger writes: >>> It is seeking to position 32 and writing '\x77\x77\x77\x77'. x86_64 is >>> little-endian, and ppc32 and sparc64 are both big-endian, right? > >> They are, but that should not meaningfully affect the results of >> that corruption step. You zapped only one line pointer not >> several, but it would look the same regardless of endiannness. > > Oh, wait a second. ItemIdData has the flag bits in the middle: > > typedef struct ItemIdData > { >unsignedlp_off:15,/* offset to tuple (from start of page) */ >lp_flags:2, /* state of line pointer, see below */ >lp_len:15;/* byte length of tuple */ > } ItemIdData; > > meaning that for that particular bit pattern, one endianness > is going to see the flags as 01 (LP_NORMAL) and the other as 10 > (LP_REDIRECT). The offset/len are corrupt either way, but > I'd certainly expect that amcheck would produce different > complaints about those two cases. So it's unsurprising if > this test case's output is endian-dependent. Well, the issue is that on big-endian machines it is not reporting any corruption at all. Are you sure the difference will be LP_NORMAL vs LP_REDIRECT? I was thinking it was LP_DEAD vs LP_REDIRECT, as the little endian platforms are seeing corruption messages about bad redirect line pointers, and the big-endian are apparently skipping over the line pointer entirely, which makes sense if it is LP_DEAD but not if it is LP_NORMAL. It would also skip over LP_UNUSED, but I don't see how that could be stored in lp_flags, because 0x77 is going to either be 01110111 or 11101110, and in neither case do you get two zeros adjacent, but you could get two ones adjacent. (LP_UNUSED = binary 00 and LP_DEAD = binary 11) — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
> On Oct 22, 2020, at 2:06 PM, Tom Lane wrote: > > I wrote: >> Mark Dilger writes: >>> It is seeking to position 32 and writing '\x77\x77\x77\x77'. x86_64 is >>> little-endian, and ppc32 and sparc64 are both big-endian, right? > >> They are, but that should not meaningfully affect the results of >> that corruption step. You zapped only one line pointer not >> several, but it would look the same regardless of endiannness. > > Oh, wait a second. ItemIdData has the flag bits in the middle: > > typedef struct ItemIdData > { >unsignedlp_off:15,/* offset to tuple (from start of page) */ >lp_flags:2, /* state of line pointer, see below */ >lp_len:15;/* byte length of tuple */ > } ItemIdData; > > meaning that for that particular bit pattern, one endianness > is going to see the flags as 01 (LP_NORMAL) and the other as 10 > (LP_REDIRECT). The offset/len are corrupt either way, but > I'd certainly expect that amcheck would produce different > complaints about those two cases. So it's unsurprising if > this test case's output is endian-dependent. Yeah, I'm already looking at that. The logic in verify_heapam skips over line pointers that are unused or dead, and the test is reporting zero corruption (and complaining about that), so it's probably not going to help to overwrite all the line pointers with this particular bit pattern any more than to just overwrite the first one, as it would just skip them all. I think the test should overwrite the line pointers with a variety of different bit patterns, or one calculated to work on all platforms. I'll have to write that up. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
I wrote: > Mark Dilger writes: >> It is seeking to position 32 and writing '\x77\x77\x77\x77'. x86_64 is >> little-endian, and ppc32 and sparc64 are both big-endian, right? > They are, but that should not meaningfully affect the results of > that corruption step. You zapped only one line pointer not > several, but it would look the same regardless of endiannness. Oh, wait a second. ItemIdData has the flag bits in the middle: typedef struct ItemIdData { unsignedlp_off:15,/* offset to tuple (from start of page) */ lp_flags:2, /* state of line pointer, see below */ lp_len:15;/* byte length of tuple */ } ItemIdData; meaning that for that particular bit pattern, one endianness is going to see the flags as 01 (LP_NORMAL) and the other as 10 (LP_REDIRECT). The offset/len are corrupt either way, but I'd certainly expect that amcheck would produce different complaints about those two cases. So it's unsurprising if this test case's output is endian-dependent. regards, tom lane
Re: new heapcheck contrib module
Mark Dilger writes: >> On Oct 22, 2020, at 1:31 PM, Tom Lane wrote: >> Hm, but why are we seeing the failure only on specific machine >> architectures? sparc64 and ppc32 is a weird pairing, too. > It is seeking to position 32 and writing '\x77\x77\x77\x77'. x86_64 is > little-endian, and ppc32 and sparc64 are both big-endian, right? They are, but that should not meaningfully affect the results of that corruption step. You zapped only one line pointer not several, but it would look the same regardless of endiannness. I find it more plausible that we might see the bad effects of the uninitialized variable only on those arches --- but that theory is still pretty shaky, since you'd think compiler choices about register or stack-location assignment would be the controlling factor, and those should be all over the map. regards, tom lane
Re: new heapcheck contrib module
> On Oct 22, 2020, at 1:31 PM, Tom Lane wrote: > > Mark Dilger writes: >>> On Oct 22, 2020, at 1:09 PM, Tom Lane wrote: >>> ooh, looks like prairiedog sees the problem too. That means I should be >>> able to reproduce it under a debugger, if you're not certain yet where >>> the problem lies. > >> Thanks, Tom, but I question whether the regression test failures are from a >> problem in the verify_heapam.c code. I think they are a busted perl test. >> The test was supposed to corrupt the heap by overwriting a heap file with a >> large chunk of garbage, but in fact only wrote a small amount of garbage. >> The idea was to write about 2000 bytes starting at offset 32 in the page, in >> order to corrupt the line pointers, but owing to my incorrect use of >> syswrite in the perl test, that didn't happen. > > Hm, but why are we seeing the failure only on specific machine > architectures? sparc64 and ppc32 is a weird pairing, too. It is seeking to position 32 and writing '\x77\x77\x77\x77'. x86_64 is little-endian, and ppc32 and sparc64 are both big-endian, right? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
Mark Dilger writes: >> On Oct 22, 2020, at 1:09 PM, Tom Lane wrote: >> ooh, looks like prairiedog sees the problem too. That means I should be >> able to reproduce it under a debugger, if you're not certain yet where >> the problem lies. > Thanks, Tom, but I question whether the regression test failures are from a > problem in the verify_heapam.c code. I think they are a busted perl test. > The test was supposed to corrupt the heap by overwriting a heap file with a > large chunk of garbage, but in fact only wrote a small amount of garbage. > The idea was to write about 2000 bytes starting at offset 32 in the page, in > order to corrupt the line pointers, but owing to my incorrect use of syswrite > in the perl test, that didn't happen. Hm, but why are we seeing the failure only on specific machine architectures? sparc64 and ppc32 is a weird pairing, too. regards, tom lane
Re: new heapcheck contrib module
> On Oct 22, 2020, at 1:23 PM, Tom Lane wrote: > > ... btw, having now looked more closely at get_xid_status(), I wonder > how come there aren't more compilers bitching about it, because it > is very very obviously broken. In particular, the case of > requesting status for an xid that is BootstrapTransactionId or > FrozenTransactionId *will* fall through to perform > FullTransactionIdPrecedesOrEquals with an uninitialized fxid. > > The fact that most compilers seem to fail to notice that is quite scary. > I suppose it has something to do with FullTransactionId being a struct, > which makes me wonder if that choice was quite as wise as we thought. > > Meanwhile, so far as this code goes, I wonder why you don't just change it > to always set that value, ie > > XidBoundsViolation result; > FullTransactionId fxid; > FullTransactionId clog_horizon; > > + fxid = FullTransactionIdFromXidAndCtx(xid, ctx); > + > /* Quick check for special xids */ > if (!TransactionIdIsValid(xid)) > result = XID_INVALID; > else if (xid == BootstrapTransactionId || xid == FrozenTransactionId) > result = XID_BOUNDS_OK; > else > { > /* Check if the xid is within bounds */ > - fxid = FullTransactionIdFromXidAndCtx(xid, ctx); > if (!fxid_in_cached_range(fxid, ctx)) > { Yeah, I reached the same conclusion before submitting the fix upthread. I structured it a bit differently, but I believe fxid will now always get set before being used, though sometimes the function returns before doing either. I had the same thought about compilers not catching that, too. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
> On Oct 22, 2020, at 1:09 PM, Tom Lane wrote: > > ooh, looks like prairiedog sees the problem too. That means I should be > able to reproduce it under a debugger, if you're not certain yet where > the problem lies. Thanks, Tom, but I question whether the regression test failures are from a problem in the verify_heapam.c code. I think they are a busted perl test. The test was supposed to corrupt the heap by overwriting a heap file with a large chunk of garbage, but in fact only wrote a small amount of garbage. The idea was to write about 2000 bytes starting at offset 32 in the page, in order to corrupt the line pointers, but owing to my incorrect use of syswrite in the perl test, that didn't happen. I think the uninitialized variable warning is warning about a real problem in the c-code, but I have no reason to think that particular problem is causing this particular regression test failure. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
... btw, having now looked more closely at get_xid_status(), I wonder how come there aren't more compilers bitching about it, because it is very very obviously broken. In particular, the case of requesting status for an xid that is BootstrapTransactionId or FrozenTransactionId *will* fall through to perform FullTransactionIdPrecedesOrEquals with an uninitialized fxid. The fact that most compilers seem to fail to notice that is quite scary. I suppose it has something to do with FullTransactionId being a struct, which makes me wonder if that choice was quite as wise as we thought. Meanwhile, so far as this code goes, I wonder why you don't just change it to always set that value, ie XidBoundsViolation result; FullTransactionId fxid; FullTransactionId clog_horizon; + fxid = FullTransactionIdFromXidAndCtx(xid, ctx); + /* Quick check for special xids */ if (!TransactionIdIsValid(xid)) result = XID_INVALID; else if (xid == BootstrapTransactionId || xid == FrozenTransactionId) result = XID_BOUNDS_OK; else { /* Check if the xid is within bounds */ - fxid = FullTransactionIdFromXidAndCtx(xid, ctx); if (!fxid_in_cached_range(fxid, ctx)) { regards, tom lane
Re: new heapcheck contrib module
ooh, looks like prairiedog sees the problem too. That means I should be able to reproduce it under a debugger, if you're not certain yet where the problem lies. regards, tom lane
Re: new heapcheck contrib module
> On Oct 22, 2020, at 1:00 PM, Robert Haas wrote: > > On Thu, Oct 22, 2020 at 3:15 PM Mark Dilger > wrote: >> The 0001 attached patch addresses the -Werror=maybe-uninitialized problem. > > I am skeptical. Why so much code churn to fix a compiler warning? And > even in the revised code, *status isn't set in all cases, so I don't > see why this would satisfy the compiler. Even if it satisfies this > particular compiler for some other reason, some other compiler is > bound to be unhappy sometime. It's better to just arrange to set > *status always, and use a dummy value in cases where it doesn't > matter. Also, "return XID_BOUNDS_OK;;" has exceeded its recommended > allowance of semicolons. I think the compiler warning was about fxid not being set. The callers pass NULL for status if they don't want status checked, so writing *status unconditionally would be an error. Also, if the xid being checked is out of bounds, we can't check the status of the xid in clog. As for the code churn, I probably refactored it a bit more than I needed to fix the compiler warning about fxid, but that was because the old arrangement seemed to make it harder to reason about when and where fxid got set. I think that is more clear now. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
On Thu, Oct 22, 2020 at 3:15 PM Mark Dilger wrote: > The 0001 attached patch addresses the -Werror=maybe-uninitialized problem. I am skeptical. Why so much code churn to fix a compiler warning? And even in the revised code, *status isn't set in all cases, so I don't see why this would satisfy the compiler. Even if it satisfies this particular compiler for some other reason, some other compiler is bound to be unhappy sometime. It's better to just arrange to set *status always, and use a dummy value in cases where it doesn't matter. Also, "return XID_BOUNDS_OK;;" has exceeded its recommended allowance of semicolons. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
> On Oct 22, 2020, at 9:01 AM, Mark Dilger wrote: > > > >> On Oct 22, 2020, at 7:06 AM, Robert Haas wrote: >> >> On Thu, Oct 22, 2020 at 8:51 AM Robert Haas wrote: >>> Committed. Let's see what the buildfarm thinks. >> >> It is mostly happy, but thorntail is not: >> >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2020-10-22%2012%3A58%3A11 >> >> I thought that the problem might be related to the fact that thorntail >> is using force_parallel_mode, but I tried that here and it did not >> cause a failure. So my next guess is that it is related to the fact >> that this is a sparc64 machine, but it's hard to tell, since none of >> the other sparc64 critters have run yet. In any case I don't know why >> that would cause a failure. The messages in the log aren't very >> illuminating, unfortunately. :-( >> >> Mark, any ideas what might cause specifically that set of tests to fail? > > The code is correctly handling an uncorrupted table, but then more or less > randomly failing some of the time when processing a corrupt table. > > Tom identified a problem with an uninitialized variable. I'm putting > together a new patch set to address it. The 0001 attached patch addresses the -Werror=maybe-uninitialized problem. The 0002 attached patch addresses the test failures: The failing test is designed to stop the server, create blunt force trauma to the heap and toast files through overwriting garbage bytes, restart the server, and verify that corruption is detected by amcheck's verify_heapam(). The exact trauma is intended to be the same on all platforms, in terms of the number of bytes written and the location in the file that it gets written, but owing to differences between platforms, by design the test does not expect a particular corruption message. The test was overwriting far fewer bytes than I had intended, but since it was still sufficient to create corruption on the platforms where I tested, I failed to notice. It should do a more thorough job now. v21-0001-Fixing-unitialized-variable-bug.patch Description: Binary data v21-0002-Fixing-sloppy-regression-test-coding.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
> On Oct 22, 2020, at 7:06 AM, Robert Haas wrote: > > On Thu, Oct 22, 2020 at 8:51 AM Robert Haas wrote: >> Committed. Let's see what the buildfarm thinks. > > It is mostly happy, but thorntail is not: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2020-10-22%2012%3A58%3A11 > > I thought that the problem might be related to the fact that thorntail > is using force_parallel_mode, but I tried that here and it did not > cause a failure. So my next guess is that it is related to the fact > that this is a sparc64 machine, but it's hard to tell, since none of > the other sparc64 critters have run yet. In any case I don't know why > that would cause a failure. The messages in the log aren't very > illuminating, unfortunately. :-( > > Mark, any ideas what might cause specifically that set of tests to fail? The code is correctly handling an uncorrupted table, but then more or less randomly failing some of the time when processing a corrupt table. Tom identified a problem with an uninitialized variable. I'm putting together a new patch set to address it. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
lapwing just spit up a possibly relevant issue: ccache gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -Werror -fPIC -I. -I. -I../../src/include -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/et -c -o verify_heapam.o verify_heapam.c verify_heapam.c: In function 'get_xid_status': verify_heapam.c:1432:5: error: 'fxid.value' may be used uninitialized in this function [-Werror=maybe-uninitialized] cc1: all warnings being treated as errors
Re: new heapcheck contrib module
On Thu, Oct 22, 2020 at 10:28 AM Tom Lane wrote: > Considering this is a TAP test, why in the world is it designed to hide > all details of any unexpected amcheck messages? Surely being able to > see what amcheck is saying would be helpful here. > > IOW, don't have the tests abbreviate the module output with count(*), > but return the full thing, and then use a regex to see if you got what > was expected. If you didn't, the output will show what you did get. Yeah, that thought crossed my mind, too. But I'm not sure it would help in the case of this particular failure, because I think the problem is that we're expecting to get complaints and instead getting none. It might be good to change it anyway, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
Robert Haas writes: > The messages in the log aren't very > illuminating, unfortunately. :-( Considering this is a TAP test, why in the world is it designed to hide all details of any unexpected amcheck messages? Surely being able to see what amcheck is saying would be helpful here. IOW, don't have the tests abbreviate the module output with count(*), but return the full thing, and then use a regex to see if you got what was expected. If you didn't, the output will show what you did get. regards, tom lane
Re: new heapcheck contrib module
On Thu, Oct 22, 2020 at 8:51 AM Robert Haas wrote: > Committed. Let's see what the buildfarm thinks. It is mostly happy, but thorntail is not: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2020-10-22%2012%3A58%3A11 I thought that the problem might be related to the fact that thorntail is using force_parallel_mode, but I tried that here and it did not cause a failure. So my next guess is that it is related to the fact that this is a sparc64 machine, but it's hard to tell, since none of the other sparc64 critters have run yet. In any case I don't know why that would cause a failure. The messages in the log aren't very illuminating, unfortunately. :-( Mark, any ideas what might cause specifically that set of tests to fail? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
On Wed, Oct 21, 2020 at 11:45 PM Mark Dilger wrote: > Done that way in the attached, which also include Robert's changes from v19 > he posted earlier today. Committed. Let's see what the buildfarm thinks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
> On Oct 21, 2020, at 1:51 PM, Tom Lane wrote: > > Mark Dilger writes: >> There is still something screwy here, though, as this compiles, links and >> runs fine for me on mac and linux, but not for Robert. > > Are you using --enable-nls at all on your Mac build? Because for sure it > should not work there, given the failure to include -lintl in amcheck's > link step. Some platforms are forgiving of that, but not Mac. Thanks, Tom! No, that's the answer. I had a typo/thinko in my configure options, --with-nls instead of --enable-nls, and the warning about it being an invalid flag went by so fast I didn't see it. I had it spelled correctly on linux, but I guess that's one of the platforms that is more forgiving. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
Mark Dilger writes: > There is still something screwy here, though, as this compiles, links and > runs fine for me on mac and linux, but not for Robert. Are you using --enable-nls at all on your Mac build? Because for sure it should not work there, given the failure to include -lintl in amcheck's link step. Some platforms are forgiving of that, but not Mac. regards, tom lane
Re: new heapcheck contrib module
Robert Haas writes: > I was about to commit 0001, after making some cosmetic changes, when I > discovered that it won't link for me. I think there must be something > wrong with the NLS stuff. My version of 0001 is attached. The error I > got is: Well, the short answer would be "you need to add SHLIB_LINK += $(filter -lintl, $(LIBS)) to the Makefile". However, I would vote against that, because in point of fact amcheck has no translation support, just like all our other contrib modules. What should likely happen instead is to rip out whatever code is overoptimistically expecting it needs to support translation. regards, tom lane
Re: new heapcheck contrib module
> On Oct 21, 2020, at 1:13 PM, Alvaro Herrera wrote: > > On 2020-Oct-21, Robert Haas wrote: > >> On Wed, Oct 7, 2020 at 9:01 PM Mark Dilger >> wrote: >>> This next version, attached, has the acl checking and associated >>> documentation changes split out into patch 0005, making it easier to review >>> in isolation from the rest of the patch series. >>> >>> Independently of acl considerations, this version also has some verbiage >>> changes in 0004, in response to Andrey's review upthread. >> >> I was about to commit 0001, after making some cosmetic changes, when I >> discovered that it won't link for me. I think there must be something >> wrong with the NLS stuff. My version of 0001 is attached. The error I >> got is: > > Hmm ... I don't think we have translation support in contrib, do we? I > think you could solve that by adding a "#undef _, #define _(...) (...)" > or similar at the top of the offending C files, assuming you don't want > to rip out all use of _() there. There is still something screwy here, though, as this compiles, links and runs fine for me on mac and linux, but not for Robert. On mac, I'm using the toolchain from XCode, whereas Robert is using MacPorts. Mine reports: Apple clang version 11.0.0 (clang-1100.0.33.17) Target: x86_64-apple-darwin19.6.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin Robert's reports: clang version 5.0.2 (tags/RELEASE_502/final) Target: x86_64-apple-darwin19.4.0 Thread model: posix InstalledDir: /opt/local/libexec/llvm-5.0/bin On linux, I'm using gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36) Searching around on the web, there are various reports of MacPort's clang not linking libintl correctly, though I don't know if that is a real problem with MacPorts or just a few cases of user error. Has anybody else following this thread had issues with MacPort's version of clang vis-a-vis linking libintl's gettext? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
On 2020-Oct-21, Robert Haas wrote: > On Wed, Oct 7, 2020 at 9:01 PM Mark Dilger > wrote: > > This next version, attached, has the acl checking and associated > > documentation changes split out into patch 0005, making it easier to review > > in isolation from the rest of the patch series. > > > > Independently of acl considerations, this version also has some verbiage > > changes in 0004, in response to Andrey's review upthread. > > I was about to commit 0001, after making some cosmetic changes, when I > discovered that it won't link for me. I think there must be something > wrong with the NLS stuff. My version of 0001 is attached. The error I > got is: Hmm ... I don't think we have translation support in contrib, do we? I think you could solve that by adding a "#undef _, #define _(...) (...)" or similar at the top of the offending C files, assuming you don't want to rip out all use of _() there. TBH the usage of "translation:" comments in this patch seems over-enthusiastic to me.
Re: new heapcheck contrib module
On Wed, Oct 7, 2020 at 9:01 PM Mark Dilger wrote: > This next version, attached, has the acl checking and associated > documentation changes split out into patch 0005, making it easier to review > in isolation from the rest of the patch series. > > Independently of acl considerations, this version also has some verbiage > changes in 0004, in response to Andrey's review upthread. I was about to commit 0001, after making some cosmetic changes, when I discovered that it won't link for me. I think there must be something wrong with the NLS stuff. My version of 0001 is attached. The error I got is: ccache clang -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -g -O2 -Wall -Werror -fno-omit-frame-pointer -bundle -multiply_defined suppress -o amcheck.so verify_heapam.o verify_nbtree.o -L../../src/port -L../../src/common -L/opt/local/lib -L/opt/local/lib -L/opt/local/lib -L/opt/local/lib -L/opt/local/lib -Wl,-dead_strip_dylibs -Wall -Werror -fno-omit-frame-pointer -bundle_loader ../../src/backend/postgres Undefined symbols for architecture x86_64: "_libintl_gettext", referenced from: _verify_heapam in verify_heapam.o _check_tuple in verify_heapam.o ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation) make: *** [amcheck.so] Error 1 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company v19-0001-Extend-amcheck-to-check-heap-pages.patch Description: Binary data
Re: new heapcheck contrib module
On Mon, Oct 5, 2020 at 5:24 PM Mark Dilger wrote: > > I don't see how verify_heapam will avoid raising an error during basic > > validation from PageIsVerified(), which will violate the guarantee > > about not throwing errors. I don't see that as a problem myself, but > > presumably you will. > > My concern is not so much that verify_heapam will stop with an error, but > rather that it might trigger a panic that stops all backends. Stopping with > an error merely because it hits corruption is not ideal, as I would rather it > completed the scan and reported all corruptions found, but that's minor > compared to the damage done if verify_heapam creates downtime in a production > environment offering high availability guarantees. That statement might seem > nuts, given that the corrupt table itself would be causing downtime, but that > analysis depends on assumptions about table access patterns, and there is no > a priori reason to think that corrupt pages are necessarily ever being > accessed, or accessed in a way that causes crashes (rather than merely wrong > results) outside verify_heapam scanning the whole table. That seems reasonable to me. I think that it makes sense to never take down the server in a non-debug build with verify_heapam. That's not what I took away from your previous remarks on the issue, but perhaps it doesn't matter now. -- Peter Geoghegan
Re: new heapcheck contrib module
> On Oct 6, 2020, at 11:27 PM, Andrey Borodin wrote: > > > >> 7 окт. 2020 г., в 04:20, Mark Dilger >> написал(а): >> >> >> >>> On Oct 5, 2020, at 5:24 PM, Mark Dilger >>> wrote: >>> >>> - This version does not change clog handling, which leaves Andrey's concern >>> unaddressed. Peter also showed some support for (or perhaps just a lack of >>> opposition to) doing more of what Andrey suggests. I may come back to this >>> issue, depending on time available and further feedback. >> >> Attached is a patch set that includes the clog handling as discussed. The >> 0001 and 0002 are effectively unchanged since version 16 posted yesterday, >> but this now includes 0003 which creates a non-throwing interface to clog, >> and 0004 which uses the non-throwing interface from within amcheck's heap >> checking functions. >> >> I think this is a pretty good sketch for discussion, though I am unsatisfied >> with the lack of regression test coverage of verify_heapam in the presence >> of clog truncation. I was hoping to have that as part of v17, but since it >> is taking a bit longer than I anticipated, I'll have to come back with that >> in a later patch. >> > > Many thanks, Mark! I really appreciate this functionality. It could save me > many hours of recreating clogs. You are quite welcome, though the thanks may be premature. I posted 0003 and 0004 patches mostly as concrete implementation examples that can be criticized. > I'm not entire sure this message is correct: psprintf(_("xmax %u commit > status is lost") > It seems to me to be not commit status, but rather transaction status. I have changed several such messages to say "transaction status" rather than "commit status". I'll be posting it in a separate email, shortly. Thanks for reviewing! — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
> 7 окт. 2020 г., в 04:20, Mark Dilger > написал(а): > > > >> On Oct 5, 2020, at 5:24 PM, Mark Dilger wrote: >> >> - This version does not change clog handling, which leaves Andrey's concern >> unaddressed. Peter also showed some support for (or perhaps just a lack of >> opposition to) doing more of what Andrey suggests. I may come back to this >> issue, depending on time available and further feedback. > > Attached is a patch set that includes the clog handling as discussed. The > 0001 and 0002 are effectively unchanged since version 16 posted yesterday, > but this now includes 0003 which creates a non-throwing interface to clog, > and 0004 which uses the non-throwing interface from within amcheck's heap > checking functions. > > I think this is a pretty good sketch for discussion, though I am unsatisfied > with the lack of regression test coverage of verify_heapam in the presence of > clog truncation. I was hoping to have that as part of v17, but since it is > taking a bit longer than I anticipated, I'll have to come back with that in a > later patch. > Many thanks, Mark! I really appreciate this functionality. It could save me many hours of recreating clogs. I'm not entire sure this message is correct: psprintf(_("xmax %u commit status is lost") It seems to me to be not commit status, but rather transaction status. Thanks! Best regards, Andrey Borodin.
Re: new heapcheck contrib module
On Tue, Aug 25, 2020 at 07:36:53AM -0700, Mark Dilger wrote: > Removed. This patch is failing to compile on Windows: C:\projects\postgresql\src\include\fe_utils/print.h(18): fatal error C1083: Cannot open include file: 'libpq-fe.h': No such file or directory [C:\projects\postgresql\pg_amcheck.vcxproj] It looks like you forgot to tweak the scripts in src/tools/msvc/. -- Michael signature.asc Description: PGP signature
Re: new heapcheck contrib module
Greetings, * Peter Geoghegan (p...@bowt.ie) wrote: > On Tue, Sep 22, 2020 at 12:41 PM Robert Haas wrote: > > But now I see that there's no secondary permission check in the > > verify_nbtree.c code. Is that intentional? Peter, what's the > > justification for that? > > As noted by comments in contrib/amcheck/sql/check_btree.sql (the > verify_nbtree.c tests), this is intentional. Note that we explicitly > test that a non-superuser role can perform verification following > GRANT EXECUTE ON FUNCTION ... . > As I mentioned earlier, this is supported (or at least it is supported > in my interpretation of things). It just isn't documented anywhere > outside the test itself. Would certainly be good to document this but I tend to agree with the comments that ideally- a) it'd be nice for a relatively low-privileged user/process could run the tests in an ongoing manner b) we don't want to add more is-superuser checks c) users shouldn't really be given the ability to see rows they're not supposed to have access to In other places in the code, when an error is generated and the user doesn't have access to the underlying table or doesn't have BYPASSRLS, we don't include the details or the actual data in the error. Perhaps that approach would make sense here (or perhaps not, but it doesn't seem entirely crazy to me, anyway). In other words: a) keep the ability for someone who has EXECUTE on the function to be able to run the function against any relation b) when we detect an issue, perform a permissions check to see if the user calling the function has rights to read the rows of the table and, if RLS is enabled on the table, if they have BYPASSRLS c) if the user has appropriate privileges, log the detailed error, if not, return a generic error with a HINT that details weren't available due to lack of privileges on the relation I can appreciate the concerns regarding dead rows ending up being visible to someone who wouldn't normally be able to see them but I'd argue we could simply document that fact rather than try to build something to address it, for this particular case. If there's push back on that then I'd suggest we have a "can read dead rows" or some such capability that can be GRANT'd (in the form of a default role, I would think) which a user would also have to have in order to get detailed error reports from this function. Thanks, Stephen signature.asc Description: PGP signature
Re: new heapcheck contrib module
Peter Geoghegan writes: > On Mon, Sep 21, 2020 at 2:09 PM Robert Haas wrote: >> +REVOKE ALL ON FUNCTION >> +verify_heapam(regclass, boolean, boolean, cstring, bigint, bigint) >> +FROM PUBLIC; >> >> This too. > Do we really want to use a cstring as an enum-like argument? Ugh. We should not be using cstring as a SQL-exposed datatype unless there really is no alternative. Why wasn't this argument declared "text"? regards, tom lane
Re: new heapcheck contrib module
On Sat, Aug 29, 2020 at 10:48 AM Mark Dilger wrote: > I had an earlier version of the verify_heapam patch that included a > non-throwing interface to clog. Ultimately, I ripped that out. My reasoning > was that a simpler patch submission was more likely to be acceptable to the > community. Isn't some kind of pragmatic compromise possible? > But I don't want to make this patch dependent on that hypothetical patch > getting written and accepted. Fair enough, but if you're alluding to what I said then about check_tuphdr_xids()/clog checking a while back then FWIW I didn't intend to block progress on clog/xact status verification at all. I just don't think that it is sensible to impose an iron clad guarantee about having no assertion failures with corrupt clog data -- that leads to far too much code duplication. But why should you need to provide an absolute guarantee of that? I for one would be fine with making the clog checks an optional extra, that rescinds the no crash guarantee that you're keen on -- just like with the TOAST checks that you have already in v15. It might make sense to review how often crashes occur with simulated corruption, and then to minimize the number of occurrences in the real world. Maybe we could tolerate a usually-no-crash interface to clog -- if it could still have assertion failures. Making a strong guarantee about assertions seems unnecessary. I don't see how verify_heapam will avoid raising an error during basic validation from PageIsVerified(), which will violate the guarantee about not throwing errors. I don't see that as a problem myself, but presumably you will. -- Peter Geoghegan
Re: new heapcheck contrib module
On Mon, Sep 21, 2020 at 2:09 PM Robert Haas wrote: > +REVOKE ALL ON FUNCTION > +verify_heapam(regclass, boolean, boolean, cstring, bigint, bigint) > +FROM PUBLIC; > > This too. Do we really want to use a cstring as an enum-like argument? I think that I see a bug at this point in check_tuple() (in v15-0001-Adding-function-verify_heapam-to-amcheck-module.patch): > + /* If xmax is a multixact, it should be within valid range */ > + xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr); > + if ((infomask & HEAP_XMAX_IS_MULTI) && !mxid_valid_in_rel(xmax, ctx)) > + { *** SNIP *** > + } > + > + /* If xmax is normal, it should be within valid range */ > + if (TransactionIdIsNormal(xmax)) > + { Why should it be okay to call TransactionIdIsNormal(xmax) at this point? It isn't certain that xmax is an XID at all (could be a MultiXactId, since you called HeapTupleHeaderGetRawXmax() to get the value in the first place). Don't you need to check "(infomask & HEAP_XMAX_IS_MULTI) == 0" here? This does look like it's shaping up. Thanks for working on it, Mark. -- Peter Geoghegan
Re: new heapcheck contrib module
On Tue, Sep 22, 2020 at 12:41 PM Robert Haas wrote: > But now I see that there's no secondary permission check in the > verify_nbtree.c code. Is that intentional? Peter, what's the > justification for that? As noted by comments in contrib/amcheck/sql/check_btree.sql (the verify_nbtree.c tests), this is intentional. Note that we explicitly test that a non-superuser role can perform verification following GRANT EXECUTE ON FUNCTION ... . As I mentioned earlier, this is supported (or at least it is supported in my interpretation of things). It just isn't documented anywhere outside the test itself. -- Peter Geoghegan
Re: new heapcheck contrib module
On Tue, Sep 22, 2020 at 1:55 PM Mark Dilger wrote: > I am inclined to just restrict verify_heapam() to superusers and be done. > What do you think? I think that's an old and largely failed approach. If you want to use pg_class_ownercheck here rather than pg_class_aclcheck or something like that, seems fair enough. But I don't think there should be an is-superuser check in the code, because we've been trying really hard to get rid of those in most places. And I also don't think there should be no secondary permissions check, because if somebody does grant execute permission on these functions, it's unlikely that they want the person getting that permission to be able to check every relation in the system even those on which they have no other privileges at all. But now I see that there's no secondary permission check in the verify_nbtree.c code. Is that intentional? Peter, what's the justification for that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
On Tue, Sep 22, 2020 at 10:55 AM Mark Dilger wrote: > I am inclined to just restrict verify_heapam() to superusers and be done. > What do you think? The existing amcheck functions were designed to have execute privilege granted to non-superusers, though we never actually advertised that fact. Maybe now would be a good time to start doing so. -- Peter Geoghegan
Re: new heapcheck contrib module
> On Sep 21, 2020, at 2:09 PM, Robert Haas wrote: > > I think there should be a call to pg_class_aclcheck() here, just like > the one in pg_prewarm, so that if the superuser does choose to grant > access, users given access can check tables they anyway have > permission to access, but not others. Maybe put that in > check_relation_relkind_and_relam() and rename it. Might want to look > at the pg_surgery precedent, too. In the presence of corruption, verify_heapam() reports to the user (in other words, leaks) metadata about the corrupted rows. Reasoning about the attack vectors this creates is hard, but a conservative approach is to assume that an attacker can cause corruption in order to benefit from the leakage, and make sure the leakage does not violate any reasonable security expectations. Basing the security decision on whether the user has access to read the table seems insufficient, as it ignores row level security. Perhaps that is ok if row level security is not enabled for the table or if the user has been granted BYPASSRLS. There is another problem, though. There is no grantable privilege to read dead rows. In the case of corruption, verify_heapam() may well report metadata about dead rows. pg_surgery also appears to leak information about dead rows. Owners of tables can probe whether supplied TIDs refer to dead rows. If a table containing sensitive information has rows deleted prior to ownership being transferred, the new owner of the table could probe each page of deleted data to determine something of the content that was there. Information about the number of deleted rows is already available through the pg_stat_* views, but those views don't give such a fine-grained approach to figuring out how large each deleted row was. For a table with fixed content options, the content can sometimes be completely inferred from the length of the row. (Consider a table with a single text column containing either "approved" or "denied".) But pg_surgery is understood to be a collection of sharp tools only to be used under fairly exceptional conditions. amcheck, on the other hand, is something that feels safer and more reasonable to use on a regular basis, perhaps from a cron job executed by a less trusted user. Forcing the user to be superuser makes it clearer that this feeling of safety is not justified. I am inclined to just restrict verify_heapam() to superusers and be done. What do you think? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
On Tue, Aug 25, 2020 at 10:36 AM Mark Dilger wrote: > Thanks for the review! + msg OUT text + ) Looks like atypical formatting. +REVOKE ALL ON FUNCTION +verify_heapam(regclass, boolean, boolean, cstring, bigint, bigint) +FROM PUBLIC; This too. +-- Don't want this to be available to public Add "by default, but superusers can grant access" or so? I think there should be a call to pg_class_aclcheck() here, just like the one in pg_prewarm, so that if the superuser does choose to grant access, users given access can check tables they anyway have permission to access, but not others. Maybe put that in check_relation_relkind_and_relam() and rename it. Might want to look at the pg_surgery precedent, too. Oh, and that functions header comment is also wrong. I think that the way the checks on the block range are performed could be improved. Generally, we want to avoid reporting the same problem with a variety of different message strings, because it adds burden for translators and is potentially confusing for users. You've got two message strings that are only going to be used for empty relations and a third message string that is only going to be used for non-empty relations. What stops you from just ripping off the way that this is done in pg_prewarm, which requires only 2 messages? Then you'd be adding a net total of 0 new messages instead of 3, and in my view they would be clearer than your third message, "block range is out of bounds for relation with block count %u: " INT64_FORMAT " .. " INT64_FORMAT, which doesn't say very precisely what the problem is, and also falls afoul of our usual practice of avoiding the use of INT64_FORMAT in error messages that are subject to translation. I notice that pg_prewarm just silently does nothing if the start and end blocks are swapped, rather than generating an error. We could choose to do differently here, but I'm not sure why we should bother. + all_frozen = mapbits & VISIBILITYMAP_ALL_VISIBLE; + all_visible = mapbits & VISIBILITYMAP_ALL_FROZEN; + + if ((all_frozen && skip_option == SKIP_PAGES_ALL_FROZEN) || + (all_visible && skip_option == SKIP_PAGES_ALL_VISIBLE)) + { + continue; + } This isn't horrible style, but why not just get rid of the local variables? e.g. if (skip_option == SKIP_PAGES_ALL_FROZEN) { if ((mapbits & VISIBILITYMAP_ALL_FROZEN) != 0) continue; } else { ... } Typically no braces around a block containing only one line. + * table contains corrupt all frozen bits, a concurrent vacuum might skip the all-frozen? + * relfrozenxid beyond xid.) Reporting the xid as valid under such conditions + * seems acceptable, since if we had checked it earlier in our scan it would + * have truly been valid at that time, and we break no MVCC guarantees by + * failing to notice the concurrent change in its status. I agree with the first half of this sentence, but I don't know what MVCC guarantees have to do with anything. I'd just delete the second part, or make it a lot clearer. + * Some kinds of tuple header corruption make it unsafe to check the tuple + * attributes, for example when the tuple is foreshortened and such checks + * would read beyond the end of the line pointer (and perhaps the page). In I think of foreshortening mostly as an art term, though I guess it has other meanings. Maybe it would be clearer to say something like "Some kinds of corruption make it unsafe to check the tuple attributes, for example when the line pointer refers to a range of bytes outside the page"? + * Other kinds of tuple header corruption do not bare on the question of bear + pstrdup(_("updating transaction ID marked incompatibly as keys updated and locked only"))); + pstrdup(_("updating transaction ID marked incompatibly as committed and as a multitransaction ID"))); "updating transaction ID" might scare somebody who thinks that you are telling them that you changed something. That's not what it means, but it might not be totally clear. Maybe: tuple is marked as only locked, but also claims key columns were updated multixact should not be marked committed + psprintf(_("data offset differs from expected: %u vs. %u (1 attribute, has nulls)"), For these, how about: tuple data should begin at byte %u, but actually begins at byte %u (1 attribute, has nulls) etc. + psprintf(_("old-style VACUUM FULL transaction ID is in the future: %u"), + psprintf(_("old-style VACUUM FULL transaction ID precedes freeze threshold: %u"), + psprintf(_("old-style VACUUM FULL transaction ID is invalid in this relation: %u"), old-style VACUUM FULL transaction ID %u is in the future old-style VACUUM FULL transact
Re: new heapcheck contrib module
> On Aug 29, 2020, at 3:27 AM, Andrey M. Borodin wrote: > > > >> 29 авг. 2020 г., в 00:56, Robert Haas написал(а): >> >> On Fri, Aug 28, 2020 at 2:10 PM Andrey M. Borodin >> wrote: >>> I don't think so. ISTM It's the same problem of xmax>> just hidden behind detoasing. >>> Our regular heap_check was checking xmin\xmax invariants for tables, but >>> failed to recognise the problem in toast (while toast was accessible until >>> CLOG truncation). >> >> The code can (and should, and I think does) refrain from looking up >> XIDs that are out of the range thought to be valid -- but how do you >> propose that it avoid looking up XIDs that ought to have clog data >> associated with them despite being >= relfrozenxid and < nextxid? >> TransactionIdDidCommit() does not have a suppress-errors flag, adding >> one would be quite invasive, yet we cannot safely perform a >> significant number of checks without knowing whether the inserting >> transaction committed. > > What you write seems completely correct to me. I agree that CLOG thresholds > lookup seems unnecessary. > > But I have a real corruption at hand (on testing site). If I have proposed > here heapcheck. And I have pg_surgery from the thread nearby. Yet I cannot > fix the problem, because cannot list affected tuples. These tools do not > solve the problem neglected for long enough. It would be supercool if they > could. > > This corruption like a caries had 3 stages: > 1. incorrect VM flag that page do not need vacuum > 2. xmin and xmax < relfrozenxid > 3. CLOG truncated > > Stage 2 is curable with proposed toolset, stage 3 is not. But they are not > that different. I had an earlier version of the verify_heapam patch that included a non-throwing interface to clog. Ultimately, I ripped that out. My reasoning was that a simpler patch submission was more likely to be acceptable to the community. If you want to submit a separate patch that creates a non-throwing version of the clog interface, and get the community to accept and commit it, I would seriously consider using that from verify_heapam. If it gets committed in time, I might even do so for this release cycle. But I don't want to make this patch dependent on that hypothetical patch getting written and accepted. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
> 29 авг. 2020 г., в 00:56, Robert Haas написал(а): > > On Fri, Aug 28, 2020 at 2:10 PM Andrey M. Borodin > wrote: >> I don't think so. ISTM It's the same problem of xmax> just hidden behind detoasing. >> Our regular heap_check was checking xmin\xmax invariants for tables, but >> failed to recognise the problem in toast (while toast was accessible until >> CLOG truncation). > > The code can (and should, and I think does) refrain from looking up > XIDs that are out of the range thought to be valid -- but how do you > propose that it avoid looking up XIDs that ought to have clog data > associated with them despite being >= relfrozenxid and < nextxid? > TransactionIdDidCommit() does not have a suppress-errors flag, adding > one would be quite invasive, yet we cannot safely perform a > significant number of checks without knowing whether the inserting > transaction committed. What you write seems completely correct to me. I agree that CLOG thresholds lookup seems unnecessary. But I have a real corruption at hand (on testing site). If I have proposed here heapcheck. And I have pg_surgery from the thread nearby. Yet I cannot fix the problem, because cannot list affected tuples. These tools do not solve the problem neglected for long enough. It would be supercool if they could. This corruption like a caries had 3 stages: 1. incorrect VM flag that page do not need vacuum 2. xmin and xmax < relfrozenxid 3. CLOG truncated Stage 2 is curable with proposed toolset, stage 3 is not. But they are not that different. Thanks! Best regards, Andrey Borodin.
Re: new heapcheck contrib module
On Fri, Aug 28, 2020 at 2:10 PM Andrey M. Borodin wrote: > I don't think so. ISTM It's the same problem of xmax just hidden behind detoasing. > Our regular heap_check was checking xmin\xmax invariants for tables, but > failed to recognise the problem in toast (while toast was accessible until > CLOG truncation). The code can (and should, and I think does) refrain from looking up XIDs that are out of the range thought to be valid -- but how do you propose that it avoid looking up XIDs that ought to have clog data associated with them despite being >= relfrozenxid and < nextxid? TransactionIdDidCommit() does not have a suppress-errors flag, adding one would be quite invasive, yet we cannot safely perform a significant number of checks without knowing whether the inserting transaction committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: new heapcheck contrib module
> On Aug 28, 2020, at 11:10 AM, Andrey M. Borodin wrote: > > > >> 28 авг. 2020 г., в 18:58, Robert Haas написал(а): >> In the case >> you mention, I think we should view that as a problem with clog rather >> than a problem with the table, and thus out of scope. > > I don't think so. ISTM It's the same problem of xmax just hidden behind detoasing. > Our regular heap_check was checking xmin\xmax invariants for tables, but > failed to recognise the problem in toast (while toast was accessible until > CLOG truncation). > > Best regards, Andrey Borodin. If you lock the relations involved, check the toast table first, the toast index second, and the main table third, do you still get the problem? Look at how pg_amcheck handles this and let me know if you still see a problem. There is the ever present problem that external forces, like a rogue process deleting backend files, will strike at precisely the wrong moment, but barring that kind of concurrent corruption, I think the toast table being checked prior to the main table being checked solves some of the issues you are worried about. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company