On Sun, Mar 29, 2020 at 08:42:35PM -0400, Robert Haas wrote: > On Sat, Mar 28, 2020 at 11:40 PM Noah Misch <n...@leadboat.com> wrote: > > I think this functionality doesn't belong in its own program. If you > > suspect > > pg_basebackup or pg_restore will eventually gain the ability to merge > > incremental backups into a recovery-ready base backup, I would put the > > functionality in that program. Otherwise, I would put it in pg_checksums. > > For me, part of the friction here is that the program description indicates > > general verification, but the actual functionality merely checks hashes on a > > directory tree that happens to represent a PostgreSQL base backup. > > Suraj's original patch made this part of pg_basebackup, but I didn't > really like that, because I wanted it to have its own set of options. > I still think all the options I've added are pretty useful ones, and I > can think of other things somebody might want to do. It feels very > uncomfortable to make pg_basebackup, or pg_checksums, take either > options from set A and do thing X, or options from set B and do thing > Y.
pg_checksums does already have that property, for what it's worth. (More specifically, certain options dictate the mode, and it reports an error if another option is incompatible with the mode.) > But it feels clear that the name pg_validatebackup is not going > over very well with anyone. I think I should rename it to > pg_validatemanifest. Between those two, I would use "pg_validatebackup" if there's a fair chance it will end up doing the pg_waldump check. Otherwise, I would use "pg_validatemanifest". I still most prefer delivering this as a mode of an existing program. > > > + parse->pathname = palloc(raw_length + 1); > > > > I don't see this freed anywhere; is it? (It's useful to make peak memory > > consumption not grow in proportion to the number of files backed up.) > > We need the hash table to remain populated for the whole run time of > the tool, because we're essentially doing a full join of the actual > directory contents against the manifest contents. That's a bit > unfortunate but it doesn't seem simple to improve. I think the only > people who are really going to suffer are people who have an enormous > pile of empty or nearly-empty relations. People who have large > databases for the normal reason - i.e. a reasonable number of tables > that hold a lot of data - will have manifests of very manageable size. Okay.