Re: backup manifests

Noah Misch Sun, 29 Mar 2020 23:00:00 -0700

On Sun, Mar 29, 2020 at 08:42:35PM -0400, Robert Haas wrote:
> On Sat, Mar 28, 2020 at 11:40 PM Noah Misch <[email protected]> wrote:
> > I think this functionality doesn't belong in its own program.  If you 
> > suspect
> > pg_basebackup or pg_restore will eventually gain the ability to merge
> > incremental backups into a recovery-ready base backup, I would put the
> > functionality in that program.  Otherwise, I would put it in pg_checksums.
> > For me, part of the friction here is that the program description indicates
> > general verification, but the actual functionality merely checks hashes on a
> > directory tree that happens to represent a PostgreSQL base backup.
> 
> Suraj's original patch made this part of pg_basebackup, but I didn't
> really like that, because I wanted it to have its own set of options.
> I still think all the options I've added are pretty useful ones, and I
> can think of other things somebody might want to do. It feels very
> uncomfortable to make pg_basebackup, or pg_checksums, take either
> options from set A and do thing X, or options from set B and do thing
> Y.


pg_checksums does already have that property, for what it's worth.  (More
specifically, certain options dictate the mode, and it reports an error if
another option is incompatible with the mode.)

> But it feels clear that the name pg_validatebackup is not going
> over very well with anyone. I think I should rename it to
> pg_validatemanifest.

Between those two, I would use "pg_validatebackup" if there's a fair chance it
will end up doing the pg_waldump check.  Otherwise, I would use
"pg_validatemanifest".  I still most prefer delivering this as a mode of an
existing program.

> > > +             parse->pathname = palloc(raw_length + 1);
> >
> > I don't see this freed anywhere; is it?  (It's useful to make peak memory
> > consumption not grow in proportion to the number of files backed up.)
> 
> We need the hash table to remain populated for the whole run time of
> the tool, because we're essentially doing a full join of the actual
> directory contents against the manifest contents. That's a bit
> unfortunate but it doesn't seem simple to improve. I think the only
> people who are really going to suffer are people who have an enormous
> pile of empty or nearly-empty relations. People who have large
> databases for the normal reason - i.e. a reasonable number of tables
> that hold a lot of data - will have manifests of very manageable size.

Okay.

Re: backup manifests

Reply via email to