Hi all, I'm including additional details, as I am able to reproduce this issue a little more reliably.
Postgres Version: POSTGRES_14_9.R20230830.01_07 Vendor: Google Cloud SQL Logical Replication Protocol version 1 Here are the logs of attempt succeeding right after it fails: 2023-12-27 01:12:40.581 UTC [59790]: [6-1] db=postgres,user=postgres STATEMENT: START_REPLICATION SLOT peerflow_slot_wal_testing_2 LOGICAL 6/5AE67D79 (proto_version '1', publication_names 'peerflow_pub_wal_testing_2') <- FAILS 2023-12-27 01:12:41.087 UTC [59790]: [7-1] db=postgres,user=postgres ERROR: requested WAL segment 000000010000000600000059 has already been removed 2023-12-27 01:12:44.581 UTC [59794]: [3-1] db=postgres,user=postgres STATEMENT: START_REPLICATION SLOT peerflow_slot_wal_testing_2 LOGICAL 6/5AE67D79 (proto_version '1', publication_names 'peerflow_pub_wal_testing_2') <- SUCCEEDS 2023-12-27 01:12:44.582 UTC [59794]: [4-1] db=postgres,user=postgres LOG: logical decoding found consistent point at 6/5A31F050 Happy to include any additional details of my setup. Thanks, Kaushik On Tue, Dec 26, 2023 at 10:36 AM Kaushik Iska <kaus...@peerdb.io> wrote: > Dear PostgreSQL Community, > > I am seeking guidance regarding a recurring issue we've encountered with > WAL segment removal during logical replication using pgoutput plugin. We > sporadically encounter an error indicating that a requested WAL segment has > already been removed. This issue arises intermittently when executing > START_REPLICATION. An example error message is as follows: > > > requested WAL segment 000000010000146000000AE has already been removed > > > Please note that this error is not specific to the segment mentioned > above; it serves as an example of the type of error we are experiencing. > > Additional Context: > > > - > > max_slot_wal_keep_size is -1, logical_decoding_work_mem is 4 GB. > - > > The error seems to appear randomly and is not consistent. > - > > After a couple of retries, the replication process eventually succeeds. > - > > For one of the users it seems to be happening every 16 hours or so. > > > Our approach involves starting with START_REPLICATION 0, replicating data > in batches, and then restarting at the last LSN of the previous batch. We > are trying to understand the root cause behind the intermittent removal of > WAL segments during logical replication. Specifically, we are looking for > insights into: > > > - > > The potential reasons for the WAL segments being reported as removed. > - > > Why this error occurs intermittently and why replication succeeds > after several retries. > - > > Any advice on troubleshooting and resolving this issue, or insights > into whether it might be related to our specific replication setup or a > characteristic of pgoutput, would be highly valuable. > > > Related Posts > > > - > > https://issues.redhat.com/browse/DBZ-590 > - > > Troubleshooting Postgres Sources | Airbyte Documentation > > <https://docs.airbyte.com/integrations/sources/postgres/postgres-troubleshooting#under-cdc-incremental-mode-there-are-still-full-refresh-syncs> > - > > > > https://fivetran.com/docs/databases/postgresql/troubleshooting/last-tracked-lsn-error > > > > Thank you very much for your time and assistance. > > Thanks, > > Kaushik Iska > >