On 4/9/24 19:44, Tomas Vondra wrote:

On 4/9/24 09:59, Martín Marqués wrote:
Hello,

While doing some work/research on the new incremental backup feature
some limitations were not listed in the docs. Mainly the fact that
pg_combienbackup works with plain format and not tar.


Right. The docs mostly imply this by talking about output directory and
backup directories, but making it more explicit would not hurt.

FWIW it'd be great if we could make incremental backups work with tar
format in the future too. People probably don't want to keep around the
expanded data directory or extract everything before combining the
backups is not very convenient. Reading and writing the tar would make
this simpler.

I have a hard time seeing this feature as being very useful, especially for large databases, until pg_combinebackup works on tar (and compressed tar). Right now restoring an incremental requires at least twice the space of the original cluster, which is going to take a lot of users by surprise.

I know you have made some improvements here for COW filesystems, but my experience is that Postgres is generally not run on such filesystems, though that is changing a bit.

Around the same time, Tomas Vondra tested incremental backups with a
cluster where he enabled checksums after taking the previous full
backup. After combining the backups the synthetic backup had pages
with checksums and other pages without checksums which ended in
checksum errors.

I'm not sure just documenting this limitation is sufficient. We can't
make the incremental backups work in this case (it's as if someone
messes with cluster without writing stuff into WAL), but I think we
should do better than silently producing (seemingly) corrupted backups.

I say seemingly, because the backup is actually fine, the only problem
is it has checksums enabled in the controlfile, but the pages from the
full backup (and the early incremental backups) have no checksums.

What we could do is detect this in pg_combinebackup, and either just
disable checksums with a warning and hint to maybe enable them again. Or
maybe just print that the user needs to disable them.

I was thinking maybe we could detect this while taking the backups, and
force taking a full backup if checksums got enabled since the last
backup. But we can't do that because we only have the manifest from the
last backup, and the manifest does not include info about checksums.

I'd say making a new full backup is the right thing to do in this case. It should be easy enough to store the checksum state of the cluster in the manifest.

Regards,
-David


Reply via email to