On Wed, May 22, 2024, at 8:19 AM, Amit Kapila wrote: > > > > v2-0001: not changed > > > > Shouldn't we modify it as per the suggestion given in the email [1]? I > am wondering if we can entirely get rid of checking the primary > business and simply rely on recovery_timeout and keep checking > server_is_in_recovery(). If so, we can modify the test to use > non-default recovery_timeout (say 180s or something similar if we have > used it at any other place). As an additional check we can ensure that > constent_lsn is present on standby.
That's exactly what I want to propose as Tomas convinced me offlist that less is better when we don't have a useful recovery progress reporting mechanism to make sure it is still working on the recovery and we should wait. > > v2-0002: not changed > > > > We have added more tries to see if the primary_slot_name becomes > active but I think it is still fragile because it is possible on slow > machines that the required slot didn't become active even after more > retries. I have raised the same comment previously [2] and asked an > additional question but didn't get any response. Following the same line that simplifies the code, we can: (a) add a loop in check_subscriber() that waits until walreceiver is available on subscriber or (b) use a timeout. The main advantage of (a) is that the primary slot is already available but I'm afraid we need a escape mechanism for the loop (timeout?). I'll summarize all issues as soon as I finish the review of sync slot support. I think we should avoid new development if we judge that the item can be documented as a limitation for this version. Nevertheless, I will share patches so you can give your opinion on whether it is an open item or new development. -- Euler Taveira EDB https://www.enterprisedb.com/