On Thu, May 23, 2024, at 5:54 AM, Amit Kapila wrote:
> On Wed, May 22, 2024 at 8:46 PM Euler Taveira <eu...@eulerto.com> wrote:
> >
> > Following the same line that simplifies the code, we can: (a) add a loop in
> > check_subscriber() that waits until walreceiver is available on subscriber 
> > or
> > (b) use a timeout. The main advantage of (a) is that the primary slot is 
> > already
> > available but I'm afraid we need a escape mechanism for the loop (timeout?).
> >
> 
> Sorry, it is not clear to me why we need any additional loop in
> check_subscriber(), aren't we speaking about the problem in
> check_publisher() function?

The idea is to use check_subscriber() to check pg_stat_walreceiver. Once this
view returns a row and primary_slot_name is set on standby, the referred
replication slot name should be active on primary. Hence, the query on
check_publisher() make sure that the referred replication slot is in use on
primary. 

> Why in the first place do we need to ensure that primary_slot_name is
> active on the primary? You mentioned something related to WAL
> retention but I don't know how that is related to this tool's
> functionality. If at all, we are bothered about WAL retention on the
> primary that should be the WAL corresponding to consistent_lsn
> computed by setup_publisher() but this check doesn't seem to ensure
> that.

Maybe it is a lot of checks. I'm afraid there isn't a simple way to get and
make sure the replication slot is used by the physical replication. I mean if
there is primary_slot_name = 'foo' on standby, there is no guarantee that the
replication slot 'foo' exists on primary. The idea is to get the exact
replication slot name used by physical replication to drop it. Once I posted a
patch it should be clear. (Another idea is to relax this check and rely only on
primary_slot_name to drop this replication slot on primary. The replication slot
might not exist and it shouldn't return an error in this case.)


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Reply via email to