Hi all,

I'm including additional details, as I am able to reproduce this issue a
little more reliably.

Postgres Version: POSTGRES_14_9.R20230830.01_07
Vendor: Google Cloud SQL
Logical Replication Protocol version 1

Here are the logs of attempt succeeding right after it fails:

2023-12-27 01:12:40.581 UTC [59790]: [6-1] db=postgres,user=postgres
STATEMENT:  START_REPLICATION SLOT peerflow_slot_wal_testing_2 LOGICAL
6/5AE67D79 (proto_version '1', publication_names
'peerflow_pub_wal_testing_2') <- FAILS
2023-12-27 01:12:41.087 UTC [59790]: [7-1] db=postgres,user=postgres ERROR:
 requested WAL segment 000000010000000600000059 has already been removed
2023-12-27 01:12:44.581 UTC [59794]: [3-1] db=postgres,user=postgres
STATEMENT:  START_REPLICATION SLOT peerflow_slot_wal_testing_2 LOGICAL
6/5AE67D79 (proto_version '1', publication_names
'peerflow_pub_wal_testing_2')  <- SUCCEEDS
2023-12-27 01:12:44.582 UTC [59794]: [4-1] db=postgres,user=postgres LOG:
 logical decoding found consistent point at 6/5A31F050

Happy to include any additional details of my setup.

Thanks,
Kaushik


On Tue, Dec 26, 2023 at 10:36 AM Kaushik Iska <kaus...@peerdb.io> wrote:

> Dear PostgreSQL Community,
>
> I am seeking guidance regarding a recurring issue we've encountered with
> WAL segment removal during logical replication using pgoutput plugin. We
> sporadically encounter an error indicating that a requested WAL segment has
> already been removed. This issue arises intermittently when executing
> START_REPLICATION. An example error message is as follows:
>
>
> requested WAL segment 000000010000146000000AE has already been removed
>
>
> Please note that this error is not specific to the segment mentioned
> above; it serves as an example of the type of error we are experiencing.
>
> Additional Context:
>
>
>    -
>
>    max_slot_wal_keep_size is -1, logical_decoding_work_mem is 4 GB.
>    -
>
>    The error seems to appear randomly and is not consistent.
>    -
>
>    After a couple of retries, the replication process eventually succeeds.
>    -
>
>    For one of the users it seems to be happening every 16 hours or so.
>
>
> Our approach involves starting with START_REPLICATION 0, replicating data
> in batches, and then restarting at the last LSN of the previous batch. We
> are trying to understand the root cause behind the intermittent removal of
> WAL segments during logical replication. Specifically, we are looking for
> insights into:
>
>
>    -
>
>    The potential reasons for the WAL segments being reported as removed.
>    -
>
>    Why this error occurs intermittently and why replication succeeds
>    after several retries.
>    -
>
>    Any advice on troubleshooting and resolving this issue, or insights
>    into whether it might be related to our specific replication setup or a
>    characteristic of pgoutput, would be highly valuable.
>
>
> Related Posts
>
>
>    -
>
>    https://issues.redhat.com/browse/DBZ-590
>    -
>
>    Troubleshooting Postgres Sources | Airbyte Documentation
>    
> <https://docs.airbyte.com/integrations/sources/postgres/postgres-troubleshooting#under-cdc-incremental-mode-there-are-still-full-refresh-syncs>
>    -
>
>
>    
> https://fivetran.com/docs/databases/postgresql/troubleshooting/last-tracked-lsn-error
>
>
>
> Thank you very much for your time and assistance.
>
> Thanks,
>
> Kaushik Iska
>
>

Reply via email to