Re: trying again to get incremental backup

Jakub Wartak Thu, 28 Sep 2023 03:22:48 -0700

On Wed, Aug 30, 2023 at 4:50 PM Robert Haas <robertmh...@gmail.com> wrote:
[..]


I've played a little bit more this second batch of patches on
e8d74ad625f7344f6b715254d3869663c1569a51 @ 31Aug (days before wait
events refactor):

test_across_wallevelminimal.sh
test_many_incrementals_dbcreate.sh
test_many_incrementals.sh
test_multixact.sh
test_pending_2pc.sh
test_reindex_and_vacuum_full.sh
test_truncaterollback.sh
test_unlogged_table.sh

all those basic tests had GOOD results. Please find attached. I'll try
to schedule some more realistic (in terms of workload and sizes) test
in a couple of days + maybe have some fun with cross-backup-and
restores across standbys. As per earlier doubt: raw wal_level =
minimal situation, shouldn't be a concern, sadly because it requires
max_wal_senders==0, while pg_basebackup requires it above 0 (due to
"FATAL:  number of requested standby connections exceeds
max_wal_senders (currently 0)").

I wanted to also introduce corruption onto pg_walsummaries files, but
later saw in code that is already covered with CRC32, cool.

In v07:
> +#define MINIMUM_VERSION_FOR_WAL_SUMMARIES 160000

170000 ?

> A related design question is whether we should really be sending the
> whole backup manifest to the server at all. If it turns out that we
> don't really need anything except for the LSN of the previous backup,
> we could send that one piece of information instead of everything. On
> the other hand, if we need the list of files from the previous backup,
> then sending the whole manifest makes sense.

If that is still an area open for discussion: wouldn't it be better to
just specify LSN as it would allow resyncing standby across major lag
where the WAL to replay would be enormous? Given that we had
primary->standby where standby would be stuck on some LSN, right now
it would be:
1) calculate backup manifest of desynced 10TB standby (how? using
which tool?)  - even if possible, that means reading 10TB of data
instead of just putting a number, isn't it?
2) backup primary with such incremental backup >= LSN
3) copy the incremental backup to standby
4) apply it to the impaired standby
5) restart the WAL replay

> - We only know how to operate on directories, not tar files. I thought
> about that when working on pg_verifybackup as well, but I didn't do
> anything about it. It would be nice to go back and make that tool work
> on tar-format backups, and this one, too. I don't think there would be
> a whole lot of point trying to operate on compressed tar files because
> you need random access and that seems hard on a compressed file, but
> on uncompressed files it seems at least theoretically doable. I'm not
> sure whether anyone would care that much about this, though, even
> though it does sound pretty cool.

Also maybe it's too early to ask, but wouldn't it be nice if we could
have an future option in pg_combinebackup to avoid double writes when
used from restore hosts (right now we need to first to reconstruct the
original datadir from full and incremental backups on host hosting
backups and then TRANSFER it again and on target host?). So something
like that could work well from restorehost: pg_combinebackup
/tmp/backup1 /tmp/incbackup2 /tmp/incbackup3 -O tar -o - | ssh
dbserver 'tar xvf -C /path/to/restored/cluster - ' . The bad thing is
that such a pipe prevents parallelism from day 1 and I'm afraid I do
not have a better easy idea on how to have both at the same time in
the long term.

-J.

incrbackuptests-0.1.tgz
Description: application/compressed

Re: trying again to get incremental backup

Reply via email to