On Wed, May 22, 2024, at 8:19 AM, Amit Kapila wrote:
> >
> > v2-0001: not changed
> >
> 
> Shouldn't we modify it as per the suggestion given in the email [1]? I
> am wondering if we can entirely get rid of checking the primary
> business and simply rely on recovery_timeout and keep checking
> server_is_in_recovery(). If so, we can modify the test to use
> non-default recovery_timeout (say 180s or something similar if we have
> used it at any other place). As an additional check we can ensure that
> constent_lsn is present on standby.

That's exactly what I want to propose as Tomas convinced me offlist that less is
better when we don't have a useful recovery progress reporting mechanism to make
sure it is still working on the recovery and we should wait.

> > v2-0002: not changed
> >
> 
> We have added more tries to see if the primary_slot_name becomes
> active but I think it is still fragile because it is possible on slow
> machines that the required slot didn't become active even after more
> retries. I have raised the same comment previously [2] and asked an
> additional question but didn't get any response.

Following the same line that simplifies the code, we can: (a) add a loop in
check_subscriber() that waits until walreceiver is available on subscriber or
(b) use a timeout. The main advantage of (a) is that the primary slot is already
available but I'm afraid we need a escape mechanism for the loop (timeout?).

I'll summarize all issues as soon as I finish the review of sync slot support. I
think we should avoid new development if we judge that the item can be
documented as a limitation for this version. Nevertheless, I will share patches
so you can give your opinion on whether it is an open item or new development.


--
Euler Taveira
EDB   https://www.enterprisedb.com/

Reply via email to