On Mon, Jul 7, 2025 at 9:53 AM shveta malik <shveta.ma...@gmail.com> wrote: > > Thanks for the patch. > > > I couldn't figure out whether the query on primary to fetch all the > > slots to be synchronized should filter based on invalidation_reason > > and conflicting or not. According to synchronize_slots(), it seems > > that we retain invalidated slots on standby when failover = true and > > they would remain with synced = true on standby. Is that right? > > > > Yes, that’s correct. We sync the invalidation status of replication > slots from the primary to the standby and then stop synchronizing any > slots that have been marked as invalidated, retaining synced flag as > true. IMO, there's no need to filter out conflicting slots on the > primary, because instead of excluding them there and showing > everything as failover-ready on the standby, the correct approach is > to reflect the actual state on standby.This means conflicting slots > will appear as non-failover-ready on the standby. That’s why Step 3 > also considers conflicting flag in its evaluation.
Thanks for the explanation. WFM. If a slot is invalidated because RS_INVAL_WAL_REMOVED, conflicting = false, but the slot is not useful standby. But then it's not useful on primary as well. Is that why we are not including (invalidation_reason IS NOT NULL) condition along in (synced AND NOT temporary AND NOT conflicting)? > > 1) > +/* primary # */ SELECT array_agg(quote_literal(r.slot_name)) AS slots > + FROM pg_replication_slots r > + WHERE r.failover AND NOT r.temporary; > > On primary, there is no need to check temporary-status. We do not > allow setting failover as true for temporary slots. Why does synchronize_slots() has it in its query? > > 2) > Although not directly related to the concerns addressed in the given > patch, I think it would be helpful to add a note in the original doc > stating that Steps 1 and 2 should be executed on each subscriber node > that will be served by the standby after failover. There's a bit of semantic repeatition here since an earlier paragraph mentions a given subscriber. But I think overall it's better to be more clear than being less clear. > > I have attached a top-up patch with the above changes and a few more > trivial changes. Please include it if you find it okay. Thanks. Included. I have made a few edits and included them in the attached patch. -- Best Wishes, Ashutosh Bapat
From 09a37694180af7adcd56b4201bbd9a6fc945e586 Mon Sep 17 00:00:00 2001 From: Ashutosh Bapat <ashutosh.bapat....@gmail.com> Date: Fri, 4 Jul 2025 19:35:47 +0530 Subject: [PATCH] Clarify logical replication failover document Clarify that the existing steps in logical replication failover section are meant for a given subscriber. Also clarify that the first two steps are for a PostgreSQL subscriber. Add a separate paragraph to specify the query to be used to make sure that all the replication slots have been synchronized to the required standby in case of a planned failover. Author: Ashutosh Bapat <ashutosh.bapat....@gmail.com> Author: Shveta Malik <shveta.ma...@gmail.com> Discussion: https://www.postgresql.org/message-id/CAExHW5uiZ-fF159=jwbwpmbjzezdtmctbn+hd4mrurlcg2u...@mail.gmail.com --- doc/src/sgml/logical-replication.sgml | 40 ++++++++++++++++++++++++--- 1 file changed, 36 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml index f317ed9c50e..baf6017125d 100644 --- a/doc/src/sgml/logical-replication.sgml +++ b/doc/src/sgml/logical-replication.sgml @@ -709,8 +709,8 @@ HINT: To initiate replication, you must manually create the replication slot, e </para> <para> - To confirm that the standby server is indeed ready for failover, follow these - steps to verify that all necessary logical replication slots have been + To confirm that the standby server is indeed ready for failover for a given subscriber, follow these + steps to verify that all the logical replication slots required by that subscriber have been synchronized to the standby server: </para> @@ -782,10 +782,42 @@ HINT: To initiate replication, you must manually create the replication slot, e <para> If all the slots are present on the standby server and the result (<literal>failover_ready</literal>) of the above SQL query is true, then - existing subscriptions can continue subscribing to publications now on the - new primary server. + existing subscriptions can continue subscribing to publications on the new + primary server. + </para> + + <para> + The first two steps in the above procedure are meant for a + <productname>PostgreSQL</productname> subscriber. It is recommended to run + these steps on each subscriber node, that will be served by the designated + standby after failover, to obtain the complete list of replication + slots. This list can then be verified in Step 3 to ensure failover readiness. + Non-<productname>PostgreSQL</productname> subscribers, on the other hand, may + use their own methods to identify the replication slots used by their + respective subscriptions. + </para> + + <para> + In some cases, such as during a planned failover, it is necessary to confirm + that all subscribers, whether <productname>PostgreSQL</productname> or + non-<productname>PostgreSQL</productname>, will be able to continue + replication after failover to a given standby server. In such cases, use the + following SQL, instead of performing the first two steps above, to identify + which replication slots on the primary need to be synced to the standby that + is intended for promotion. This query returns the relevant replication slots + associated with all the failover-enabled subscriptions. </para> + <para> +<programlisting> +/* primary # */ SELECT array_agg(quote_literal(r.slot_name)) AS slots + FROM pg_replication_slots r + WHERE r.failover; + slots +------- + {'sub1','sub2','sub3', 'pg_16394_sync_16385_7394666715149055164'} +(1 row) +</programlisting></para> </sect1> <sect1 id="logical-replication-row-filter"> base-commit: 2633dae2e4876a9b7cb90ba025e930a553e5107f -- 2.34.1