On Thu, May 23, 2024, at 5:54 AM, Amit Kapila wrote: > On Wed, May 22, 2024 at 8:46 PM Euler Taveira <eu...@eulerto.com> wrote: > > > > Following the same line that simplifies the code, we can: (a) add a loop in > > check_subscriber() that waits until walreceiver is available on subscriber > > or > > (b) use a timeout. The main advantage of (a) is that the primary slot is > > already > > available but I'm afraid we need a escape mechanism for the loop (timeout?). > > > > Sorry, it is not clear to me why we need any additional loop in > check_subscriber(), aren't we speaking about the problem in > check_publisher() function?
The idea is to use check_subscriber() to check pg_stat_walreceiver. Once this view returns a row and primary_slot_name is set on standby, the referred replication slot name should be active on primary. Hence, the query on check_publisher() make sure that the referred replication slot is in use on primary. > Why in the first place do we need to ensure that primary_slot_name is > active on the primary? You mentioned something related to WAL > retention but I don't know how that is related to this tool's > functionality. If at all, we are bothered about WAL retention on the > primary that should be the WAL corresponding to consistent_lsn > computed by setup_publisher() but this check doesn't seem to ensure > that. Maybe it is a lot of checks. I'm afraid there isn't a simple way to get and make sure the replication slot is used by the physical replication. I mean if there is primary_slot_name = 'foo' on standby, there is no guarantee that the replication slot 'foo' exists on primary. The idea is to get the exact replication slot name used by physical replication to drop it. Once I posted a patch it should be clear. (Another idea is to relax this check and rely only on primary_slot_name to drop this replication slot on primary. The replication slot might not exist and it shouldn't return an error in this case.) -- Euler Taveira EDB https://www.enterprisedb.com/