On Tue, Dec 5, 2023 at 7:11 PM Robert Haas <robertmh...@gmail.com> wrote:
[..v13 patchset] The results with v13 patchset are following: * - requires checkpoint on primary when doing incremental on standby when it's too idle, this was explained by Robert in [1], something AKA too-fast-incremental backup due to testing-scenario: test_across_wallevelminimal.sh - GOOD test_many_incrementals_dbcreate.sh - GOOD test_many_incrementals.sh - GOOD test_multixact.sh - GOOD test_reindex_and_vacuum_full.sh - GOOD test_standby_incr_just_backup.sh - GOOD* test_truncaterollback.sh - GOOD test_unlogged_table.sh - GOOD test_full_pri__incr_stby__restore_on_pri.sh - GOOD test_full_pri__incr_stby__restore_on_stby.sh - GOOD test_full_stby__incr_stby__restore_on_pri.sh - GOOD* test_full_stby__incr_stby__restore_on_stby.sh - GOOD* test_incr_on_standby_after_promote.sh - GOOD* test_incr_after_timelineincrease.sh (pg_ctl stop, pg_resetwal -l 00000002000000000000000E ..., pg_ctl start, pg_basebackup --incremental) - GOOD, I've got: pg_basebackup: error: could not initiate base backup: ERROR: timeline 1 found in manifest, but not in this server's history Comment: I was wondering if it wouldn't make some sense to teach pg_resetwal to actually delete all WAL summaries after any any WAL/controlfile alteration? test_stuck_walsummary.sh (pkill -STOP walsumm) - GOOD: > This version also improves (at least, IMHO) the way that we wait for > WAL summarization to finish. Previously, you either caught up fully > within 60 seconds or you died. I didn't like that, because it seemed > like some people would get timeouts when the operation was slowly > progressing and would eventually succeed. So what this version does > is: WARNING: still waiting for WAL summarization through 0/A0000D8 after 10 seconds DETAIL: Summarization has reached 0/8000028 on disk and 0/80000F8 in memory. [..] pg_basebackup: error: could not initiate base backup: ERROR: WAL summarization is not progressing DETAIL: Summarization is needed through 0/A0000D8, but is stuck at 0/8000028 on disk and 0/80000F8 in memory. Comment2: looks good to me! test_pending_2pc.sh - getting GOOD on most recent runs, but several times during early testing (probably due to my own mishaps), I've been hit by Abort/TRAP. I'm still investigating and trying to reproduce those ones. TRAP: failed Assert("summary_end_lsn >= WalSummarizerCtl->pending_lsn"), File: "walsummarizer.c", Line: 940 Regards, -J. [1] - https://www.postgresql.org/message-id/CA%2BTgmoYuC27_ToGtTTNyHgpn_eJmdqrmhJ93bAbinkBtXsWHaA%40mail.gmail.com