On Wed, Aug 30, 2023 at 4:50 PM Robert Haas <robertmh...@gmail.com> wrote: [..]
I've played a little bit more this second batch of patches on e8d74ad625f7344f6b715254d3869663c1569a51 @ 31Aug (days before wait events refactor): test_across_wallevelminimal.sh test_many_incrementals_dbcreate.sh test_many_incrementals.sh test_multixact.sh test_pending_2pc.sh test_reindex_and_vacuum_full.sh test_truncaterollback.sh test_unlogged_table.sh all those basic tests had GOOD results. Please find attached. I'll try to schedule some more realistic (in terms of workload and sizes) test in a couple of days + maybe have some fun with cross-backup-and restores across standbys. As per earlier doubt: raw wal_level = minimal situation, shouldn't be a concern, sadly because it requires max_wal_senders==0, while pg_basebackup requires it above 0 (due to "FATAL: number of requested standby connections exceeds max_wal_senders (currently 0)"). I wanted to also introduce corruption onto pg_walsummaries files, but later saw in code that is already covered with CRC32, cool. In v07: > +#define MINIMUM_VERSION_FOR_WAL_SUMMARIES 160000 170000 ? > A related design question is whether we should really be sending the > whole backup manifest to the server at all. If it turns out that we > don't really need anything except for the LSN of the previous backup, > we could send that one piece of information instead of everything. On > the other hand, if we need the list of files from the previous backup, > then sending the whole manifest makes sense. If that is still an area open for discussion: wouldn't it be better to just specify LSN as it would allow resyncing standby across major lag where the WAL to replay would be enormous? Given that we had primary->standby where standby would be stuck on some LSN, right now it would be: 1) calculate backup manifest of desynced 10TB standby (how? using which tool?) - even if possible, that means reading 10TB of data instead of just putting a number, isn't it? 2) backup primary with such incremental backup >= LSN 3) copy the incremental backup to standby 4) apply it to the impaired standby 5) restart the WAL replay > - We only know how to operate on directories, not tar files. I thought > about that when working on pg_verifybackup as well, but I didn't do > anything about it. It would be nice to go back and make that tool work > on tar-format backups, and this one, too. I don't think there would be > a whole lot of point trying to operate on compressed tar files because > you need random access and that seems hard on a compressed file, but > on uncompressed files it seems at least theoretically doable. I'm not > sure whether anyone would care that much about this, though, even > though it does sound pretty cool. Also maybe it's too early to ask, but wouldn't it be nice if we could have an future option in pg_combinebackup to avoid double writes when used from restore hosts (right now we need to first to reconstruct the original datadir from full and incremental backups on host hosting backups and then TRANSFER it again and on target host?). So something like that could work well from restorehost: pg_combinebackup /tmp/backup1 /tmp/incbackup2 /tmp/incbackup3 -O tar -o - | ssh dbserver 'tar xvf -C /path/to/restored/cluster - ' . The bad thing is that such a pipe prevents parallelism from day 1 and I'm afraid I do not have a better easy idea on how to have both at the same time in the long term. -J.
incrbackuptests-0.1.tgz
Description: application/compressed