I am unfortunately not really familiar with Google Cloud SQL internals as
well. But we have seen this happen on Amazon RDS as well.

Could it be possible that we are requesting a future WAL segment, say WAL
upto X is written and we are asking for X + 1? It could be that the error
message is misleading.

I do not have the information from pg_replication_slots as I have
terminated the test. I am fairly certain that I can reproduce this again. I
will gather both the restart_lsn and contents of pg_wal for the failed
segment. Is there any other information that would help debug this further?

On Fri, Dec 29, 2023 at 4:16 PM Tomas Vondra <tomas.von...@enterprisedb.com>
wrote:

> On 12/27/23 16:31, Kaushik Iska wrote:
> > Hi all,
> >
> > I'm including additional details, as I am able to reproduce this issue a
> > little more reliably.
> >
> > Postgres Version: POSTGRES_14_9.R20230830.01_07
> > Vendor: Google Cloud SQL
> > Logical Replication Protocol version 1
> >
>
> I don't know much about Google Cloud SQL internals. Is it relatively
> close to Postgres (as e.g. RDS) or are the internals very different /
> modified for cloud environments?
>
> > Here are the logs of attempt succeeding right after it fails:
> >
> > 2023-12-27 01:12:40.581 UTC [59790]: [6-1] db=postgres,user=postgres
> > STATEMENT:  START_REPLICATION SLOT peerflow_slot_wal_testing_2 LOGICAL
> > 6/5AE67D79 (proto_version '1', publication_names
> > 'peerflow_pub_wal_testing_2') <- FAILS
> > 2023-12-27 01:12:41.087 UTC [59790]: [7-1] db=postgres,user=postgres
> > ERROR:  requested WAL segment 000000010000000600000059 has already been
> > removed
> > 2023-12-27 01:12:44.581 UTC [59794]: [3-1] db=postgres,user=postgres
> > STATEMENT:  START_REPLICATION SLOT peerflow_slot_wal_testing_2 LOGICAL
> > 6/5AE67D79 (proto_version '1', publication_names
> > 'peerflow_pub_wal_testing_2')  <- SUCCEEDS
> > 2023-12-27 01:12:44.582 UTC [59794]: [4-1] db=postgres,user=postgres
> > LOG:  logical decoding found consistent point at 6/5A31F050
> >
> > Happy to include any additional details of my setup.
> >
>
> I personally don't see how could this fail and then succeed, unless
> Google does something smart with the WAL segments under the hood. Surely
> we try to open the same WAL segment (given the LSN is the same), so how
> could it not exist and then exist?
>
> As Ron already suggested, it might be useful to see information for the
> replication slot peerflow_slot_wal_testing_2 (especially the restart_lsn
> value). Also, maybe show the contents of pg_wal (especially for the
> segment referenced in the error message).
>
> Can you reproduce this outside Google cloud environment?
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Reply via email to