On Mon, Oct 30, 2023 at 6:46 PM Robert Haas <robertmh...@gmail.com> wrote: > > On Thu, Sep 28, 2023 at 6:22 AM Jakub Wartak > <jakub.war...@enterprisedb.com> wrote: > > If that is still an area open for discussion: wouldn't it be better to > > just specify LSN as it would allow resyncing standby across major lag > > where the WAL to replay would be enormous? Given that we had > > primary->standby where standby would be stuck on some LSN, right now > > it would be: > > 1) calculate backup manifest of desynced 10TB standby (how? using > > which tool?) - even if possible, that means reading 10TB of data > > instead of just putting a number, isn't it? > > 2) backup primary with such incremental backup >= LSN > > 3) copy the incremental backup to standby > > 4) apply it to the impaired standby > > 5) restart the WAL replay > > As you may be able to tell from the flurry of posts and new patch > sets, I'm trying hard to sort out the remaining open items that > pertain to this patch set, and I'm now back to thinking about this > one. > > TL;DR: I think the idea has some potential, but there are some > pitfalls that I'm not sure how to address. > > I spent some time looking at how we currently use the data from the > backup manifest. Currently, we do two things with it. First, when > we're backing up each file, we check whether it's present in the > backup manifest and, if not, we back it up in full. This actually > feels fairly poor. If it makes any difference at all, then presumably > the underlying algorithm is buggy and needs to be fixed. Maybe that > should be ripped out altogether or turned into some kind of sanity > check that causes a big explosion if it fails. Second, we check > whether the WAL ranges reported by the client match up with the > timeline history of the server (see PrepareForIncrementalBackup). This > set of sanity checks seems fairly important to me, and I'd regret > discarding them. I think there's some possibility that they might > catch user error, like where somebody promotes multiple standbys and > maybe they even get the same timeline on more than one of them, and > then confusion might ensue. [..]
> Another practical problem here is that, right now, pg_combinebackup > doesn't have an in-place mode. It knows how to take a bunch of input > backups and write out an output backup, but that output backup needs > to go into a new, fresh directory (or directories plural, if there are > user-defined tablespaces). I had previously considered adding such a > mode, but the idea I had at the time wouldn't have worked for this > case. I imagined that someone might want to run "pg_combinebackup > --in-place full incr" and clobber the contents of the incr directory > with the output, basically discarding the incremental backup you took > in favor of a full backup that could have been taken at the same point > in time. [..] Thanks for answering! It all sounds like this resync-standby-using-primary-incrbackup idea isn't fit for the current pg_combinebackup, but rather for a new tool hopefully in future. It could take the current LSN from stuck standby, calculate manifest on the lagged and offline standby (do we need to calculate manifest Checksum in that case? I cannot find code for it), deliver it via "UPLOAD_MANIFEST" to primary and start fetching and applying the differences while doing some form of copy-on-write from old & incoming incrbackup data to "$relfilenodeid.new" and then durable_unlink() old one and durable_rename("$relfilenodeid.new", "$relfilenodeid". Would it still be possible in theory? (it could use additional safeguards like rename controlfile when starting and just before ending to additionally block startup if it hasn't finished). Also it looks as per comment nearby struct IncrementalBackupInfo.manifest_files that even checksums are just more for safeguarding rather than core implementation (?) What I've meant in the initial idea is not to hinder current efforts, but asking if the current design will not stand in a way for such a cool new addition in future ? > One thing I also realized when thinking about this is that you could > probably hose yourself with the patch set as it stands today by taking > a full backup, downgrading to wal_level=minimal for a while, doing > some WAL-skipping operations, upgrading to a higher WAL-level again, > and then taking an incremental backup. I think the solution to that is > probably for the WAL summarizer to refuse to run if wal_level=minimal. > Then there would be a gap in the summary files which an incremental > backup attempt would detect. As per earlier test [1], I've already tried to simulate that in incrbackuptests-0.1.tgz/test_across_wallevelminimal.sh , but that worked (but that was with CTAS-wal-minimal-optimization -> new relfilenodeOID is used for CTAS which got included in the incremental backup as it's new file) Even retested that with Your v7 patch with asserts, same. When simulating with "BEGIN; TRUNCATE nightmare; COPY nightmare FROM '/tmp/copy.out'; COMMIT;" on wal_level=minimal it still recovers using incremental backup because the WAL contains: rmgr: Storage, desc: CREATE base/5/36425 [..] rmgr: XLOG, desc: FPI , blkref #0: rel 1663/5/36425 blk 0 FPW [..] e.g. TRUNCATE sets a new relfilenode each time, so they will always be included in backup and wal_level=minimal optimizations kicks only for commands that issue a new relfilenode. True/false? postgres=# select oid, relfilenode, relname from pg_class where relname like 'night%' order by 1; oid | relfilenode | relname -------+-------------+--------------------- 16384 | 0 | nightmare 16390 | 36420 | nightmare_p0 16398 | 36425 | nightmare_p1 36411 | 0 | nightmare_pkey 36413 | 36422 | nightmare_p0_pkey 36415 | 36427 | nightmare_p1_pkey 36417 | 0 | nightmare_brin_idx 36418 | 36423 | nightmare_p0_ts_idx 36419 | 36428 | nightmare_p1_ts_idx (9 rows) postgres=# truncate nightmare; TRUNCATE TABLE postgres=# select oid, relfilenode, relname from pg_class where relname like 'night%' order by 1; oid | relfilenode | relname -------+-------------+--------------------- 16384 | 0 | nightmare 16390 | 36434 | nightmare_p0 16398 | 36439 | nightmare_p1 36411 | 0 | nightmare_pkey 36413 | 36436 | nightmare_p0_pkey 36415 | 36441 | nightmare_p1_pkey 36417 | 0 | nightmare_brin_idx 36418 | 36437 | nightmare_p0_ts_idx 36419 | 36442 | nightmare_p1_ts_idx -J. [1] - https://www.postgresql.org/message-id/CAKZiRmzT%2BbX2ZYdORO32cADtfQ9DvyaOE8fsOEWZc2V5FkEWVg%40mail.gmail.com