Re: Synchronizing slots from primary to standby
On Fri, Jun 7, 2024 at 7:57 AM Zhijie Hou (Fujitsu) wrote: > > Thanks for the comments! Here is the V6 patch that addressed the these. > I have pushed this after making minor changes in the wording. I have also changed one of the queries in docs to ignore the NULL slot_name values. -- With Regards, Amit Kapila.
RE: Synchronizing slots from primary to standby
On Thursday, June 6, 2024 12:21 PM Peter Smith > > Hi, here are some review comments for the docs patch v5-0001. Thanks for the comments! Here is the V6 patch that addressed the these. Best Regards, Hou zj v6-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch Description: v6-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch
Re: Synchronizing slots from primary to standby
Hi, here are some review comments for the docs patch v5-0001. Apart from these it LGTM. == doc/src/sgml/logical-replication.sgml 1. + + On the subscriber node, use the following SQL to identify which slots + should be synced to the standby that we plan to promote. This query will + return the relevant replication slots, including the main slots and table + synchronization slots associated with the failover enabled subscriptions. /failover enabled/failover-enabled/ ~~~ 2. + + If all the slots are present on the standby server and result + (failover_ready) of is true, then existing subscriptions + can continue subscribing to publications now on the new primary server + without any loss of data. + Hmm. It looks like there is some typo or missing words here: "of is true". Did you mean something like: "of the above SQL query is true"? == Kind Regards, Peter Smith. Fujitsu Australia
RE: Synchronizing slots from primary to standby
On Wednesday, June 5, 2024 2:32 PM Peter Smith wrote: > Hi. Here are some minor review comments for the docs patch v4-0001. Thanks for the comments! > The SGML file wrapping can be fixed to fill up to 80 cols for some of the > paragraphs. Unlike comments in C code, I think we don't force the 80 cols limit in doc file unless it's too long to read. I checked the doc once and think it's OK. Here is the V5 patch which addressed Peter's comments and Amit's comments[1]. [1] https://www.postgresql.org/message-id/CAA4eK1%2Bq1MYGgF3-LZCj6Xd0idujnjbTsfk-RqU%2BC51wYGaD5g%40mail.gmail.com Best Regards, Hou zj v5-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch Description: v5-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch
Re: Synchronizing slots from primary to standby
Hi. Here are some minor review comments for the docs patch v4-0001. == doc/src/sgml/logical-replication.sgml 1. General The SGML file wrapping can be fixed to fill up to 80 cols for some of the paragraphs. ~~~ 2. + standby is promoted. They can continue subscribing to publications now on the + new primary server without any loss of data. But please note that in case of + asynchronous replication, there remains a risk of data loss for transactions + that have been committed on the former primary server but have yet to be + replicated to the new primary server. + /in case/in the case/ /But please note that.../Note that.../ == Kind Regards, Peter Smith. Fujitsu Australia
Re: Synchronizing slots from primary to standby
On Wed, Jun 5, 2024 at 7:52 AM Zhijie Hou (Fujitsu) wrote: > > Attach the V4 doc patch which addressed Peter and Bertrand's comments. > Few comments: 1. + On the subscriber node, use the following SQL to identify + which slots should be synced to the standby that we plan to promote. + +test_sub=# SELECT + array_agg(slot_name) AS slots + FROM + (( + SELECT r.srsubid AS subid, CONCAT('pg_', srsubid, '_sync_', srrelid, '_', ctl.system_identifier) AS slot_name + FROM pg_control_system() ctl, pg_subscription_rel r, pg_subscription s + WHERE r.srsubstate = 'f' AND s.oid = r.srsubid AND s.subfailover + ) UNION ( + SELECT s.oid AS subid, s.subslotname as slot_name + FROM pg_subscription s + WHERE s.subfailover + )); + slots +--- + {sub1,sub2,sub3} This should additionally say what exactly this SQL is doing to fetch the required slots. 2. If standby_slot_names is + not configured correctly, it is highly recommended to run this step after + the primary server is down, otherwise the results of the query can vary + due to the ongoing replication on the logical subscribers from the primary + server. + + + + +On the subscriber node, check the last replayed WAL. +This step needs to be run on any database that includes failover enabled +subscriptions. + +test_sub=# SELECT + MAX(remote_lsn) AS remote_lsn_on_subscriber + FROM If the 'standby_slot_names' is not configured then we can't ensure that standby is always ahead because what if immediately after running this query the additional WAL got synced to the subscriber before standby? Now, as you mentioned users can first shutdown primary to ensure that no additional WAL is sent to the subscriber. After that, it is possible that one can use these complex queries to ensure that the subscriber is behind the standby but it is better to encourage users to use standby_slot_names to ensure the same. If at all we get such use cases and or requirements then we can add such additional steps after understanding the user's requirements. For now, we should remove these additional steps. -- With Regards, Amit Kapila.
RE: Synchronizing slots from primary to standby
On Thursday, May 23, 2024 1:34 PM Peter Smith wrote: Thanks for the comments. I addressed most of the comments except the following one which I am not sure: > 5b. > Patch says "on the subscriber node", but isn't that the simplest case? > e.g. maybe there are multiple nodes having subscriptions for these > publications. Maybe the sentence needs to account for case of subscribers on > >1 nodes. I think it's not necessary mention the multiple nodes case, as in that case, user can just perform the same steps on each node that have failover subscription. > Is there no way to discover this information by querying the publisher? I am not aware of the way for user to get the necessary info such as replication origin progress on the publisher, because such information is only available on subscriber. Attach the V4 doc patch which addressed Peter and Bertrand's comments. Best Regards, Hou zj v4-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch Description: v4-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch
RE: Synchronizing slots from primary to standby
On Wednesday, May 8, 2024 5:21 PM Bertrand Drouvot wrote: > A few comments: Thanks for the comments! > 2 === > > +test_sub=# SELECT > + array_agg(slotname) AS slots > + FROM > + (( > + SELECT r.srsubid AS subid, CONCAT('pg_', srsubid, '_sync_', > srrelid, '_', ctl.system_identifier) AS slotname > + FROM pg_control_system() ctl, pg_subscription_rel r, > pg_subscription s > + WHERE r.srsubstate = 'f' AND s.oid = r.srsubid AND > s.subfailover > + ) UNION ( > > I guess this format comes from ReplicationSlotNameForTablesync(). What > about creating a SQL callable function on top of it and make use of it in the > query above? (that would ensure to keep the doc up to date even if the format > changes in ReplicationSlotNameForTablesync()). We could add a new function as suggested but I think it's not the right time(beta1) to add this function because new function will bring catversion bump which I think may not be worth at this stage. I think we can consider this after releasing and maybe gather more use cases for the new function you suggested. > > 3 === > > +test_sub=# SELECT > + MAX(remote_lsn) AS remote_lsn_on_subscriber > + FROM > + (( > + SELECT (CASE WHEN r.srsubstate = 'f' THEN > pg_replication_origin_progress(CONCAT('pg_', r.srsubid, '_', r.srrelid), > false) > + WHEN r.srsubstate IN ('s', 'r') THEN r.srsublsn > END) AS remote_lsn > + FROM pg_subscription_rel r, pg_subscription s > + WHERE r.srsubstate IN ('f', 's', 'r') AND s.oid = r.srsubid > AND > s.subfailover > + ) UNION ( > + SELECT pg_replication_origin_progress(CONCAT('pg_', > s.oid), false) AS remote_lsn > + FROM pg_subscription s > + WHERE s.subfailover > + )); > > What about adding a join to pg_replication_origin to get rid of the > "hardcoded" > format "CONCAT('pg_', r.srsubid, '_', r.srrelid)" and "CONCAT('pg_', s.oid)"? I tried a bit, but it doesn't seem feasible to get the relationship between subscription and origin by querying pg_subscription and pg_replication_origin. Best Regards, Hou zj
Re: Synchronizing slots from primary to standby
Here are some review comments for the docs patch v3-0001. == Commit message 1. This patch adds detailed documentation for the slot sync feature including examples to guide users on how to verify that all slots have been successfully synchronized to the standby server and how to confirm whether the subscription can continue subscribing to publications on the promoted standby server. ~ This may be easier to read if you put it in bullet form like below: SUGGESTION It includes examples to guide the user: * How to verify that all slots have been successfully synchronized to the standby server * How to confirm whether the subscription can continue subscribing to publications on the promoted standby server == doc/src/sgml/high-availability.sgml 2. + +If you have opted for synchronization of logical slots (see +), +then before switching to the standby server, it is recommended to check +if the logical slots synchronized on the standby server are ready +for failover. This can be done by following the steps described in +. + + Maybe it is better to call this feature "logical replication slot synchronization" to be more consistent with the title of section 47.2.3. SUGGESTION If you have opted for logical replication slot synchronization (see ... == doc/src/sgml/logical-replication.sgml 3. + + When the publisher server is the primary server of a streaming replication, + the logical slots on that primary server can be synchronized to the standby + server by specifying failover = true when creating + subscriptions for those publications. Enabling failover ensures a seamless + transition of those subscriptions after the standby is promoted. They can + continue subscribing to publications now on the new primary server without + losing any data that has been flushed to the new primary server. + + 3a. BEFORE When the publisher server is the primary server of... SUGGESTION When publications are defined on the primary server of... ~ 3b. Enabling failover... Maybe say "Enabling the failover parameter..." and IMO there should also be a link to the CREATE SUBSCRIPTION failover parameter so the user can easily navigate there to read more about it. ~ 3c. BEFORE They can continue subscribing to publications now on the new primary server without losing any data that has been flushed to the new primary server. SUGGESTION (removes some extra info I did not think was needed) They can continue subscribing to publications now on the new primary server without any loss of data. ~~~ 4. + + Because the slot synchronization logic copies asynchronously, it is + necessary to confirm that replication slots have been synced to the standby + server before the failover happens. Furthermore, to ensure a successful + failover, the standby server must not be lagging behind the subscriber. It + is highly recommended to use + standby_slot_names + to prevent the subscriber from consuming changes faster than the hot standby. + To confirm that the standby server is indeed ready for failover, follow + these 2 steps: + IMO the last sentence "To confirm..." should be a new paragraph. ~~~ 5. + + Firstly, on the subscriber node, use the following SQL to identify + which slots should be synced to the standby that we plan to promote. AND + + Next, check that the logical replication slots identified above exist on + the standby server and are ready for failover. ~~ 5a. I don't think you need to say "Firstly," and "Next," because the order to do these steps is already self-evident. ~ 5b. Patch says "on the subscriber node", but isn't that the simplest case? e.g. maybe there are multiple nodes having subscriptions for these publications. Maybe the sentence needs to account for case of subscribers on >1 nodes. Is there no way to discover this information by querying the publisher? ~~~ 6. + +test_sub=# SELECT + array_agg(slotname) AS slots + FROM ... + +test_standby=# SELECT slot_name, (synced AND NOT temporary AND NOT conflicting) AS failover_ready + FROM pg_replication_slots + WHERE slot_name IN ('sub1','sub2','sub3'); The example SQL for "1a" refers to 'slotname', but the example SQL for "1b" refers to "slot_name" (i.e. with underscore). It might be better if those are consistently called 'slot_name'. ~~~ 7. + + + Confirm that the standby server is not lagging behind the subscribers. + This step can be skipped if + standby_slot_names + has been correctly configured. If standby_slot_names is not configured + correctly, it is highly recommended to run this step after the primary + server is down, otherwise the results of the query may vary at different + points of time due to the ongoing replication on the logical subscribers + from the primary server. + 7a. I felt that the step should just say "Confirm that the st
Re: Synchronizing slots from primary to standby
Hi, On Mon, Apr 29, 2024 at 11:58:09AM +, Zhijie Hou (Fujitsu) wrote: > On Monday, April 29, 2024 5:11 PM shveta malik wrote: > > > > On Mon, Apr 29, 2024 at 11:38 AM shveta malik > > wrote: > > > > > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu) > > > wrote: > > > > > > > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot > > wrote: > > > > > > > > > > Hi, > > > > > > > > > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote: > > > > > > Hi, > > > > > > > > > > > > Since the standby_slot_names patch has been committed, I am > > > > > > attaching the last doc patch for review. > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > 1 === > > > > > > > > > > + continue subscribing to publications now on the new primary > > > > > + server > > > > > without > > > > > + any data loss. > > > > > > > > > > I think "without any data loss" should be re-worded in this > > > > > context. Data loss in the sense "data committed on the primary and > > > > > not visible on the subscriber in case of failover" can still occurs > > > > > (in case > > synchronous replication is not used). > > > > > > > > > > 2 === > > > > > > > > > > + If the result (failover_ready) of both above > > > > > steps is > > > > > + true, existing subscriptions will be able to continue without data > > loss. > > > > > + > > > > > > > > > > I don't think that's true if synchronous replication is not used. > > > > > Say, > > > > > > > > > > - synchronous replication is not used > > > > > - primary is not able to reach the standby anymore and > > > > > standby_slot_names is set > > > > > - new data is inserted into the primary > > > > > - then not replicated to subscriber (due to standby_slot_names) > > > > > > > > > > Then I think the both above steps will return true but data would > > > > > be lost in case of failover. > > > > > > > > Thanks for the comments, attach the new version patch which reworded > > > > the above places. Thanks! > Here is the V3 doc patch. Thanks! A few comments: 1 === + losing any data that has been flushed to the new primary server. Worth to add a few words about possible data loss, something like? Please note that in case synchronous replication is not used and standby_slot_names is set correctly, it might be possible to lose data that would have been committed on the old primary server (in case the standby was not reachable during that time for example). 2 === +test_sub=# SELECT + array_agg(slotname) AS slots + FROM + (( + SELECT r.srsubid AS subid, CONCAT('pg_', srsubid, '_sync_', srrelid, '_', ctl.system_identifier) AS slotname + FROM pg_control_system() ctl, pg_subscription_rel r, pg_subscription s + WHERE r.srsubstate = 'f' AND s.oid = r.srsubid AND s.subfailover + ) UNION ( I guess this format comes from ReplicationSlotNameForTablesync(). What about creating a SQL callable function on top of it and make use of it in the query above? (that would ensure to keep the doc up to date even if the format changes in ReplicationSlotNameForTablesync()). 3 === +test_sub=# SELECT + MAX(remote_lsn) AS remote_lsn_on_subscriber + FROM + (( + SELECT (CASE WHEN r.srsubstate = 'f' THEN pg_replication_origin_progress(CONCAT('pg_', r.srsubid, '_', r.srrelid), false) + WHEN r.srsubstate IN ('s', 'r') THEN r.srsublsn END) AS remote_lsn + FROM pg_subscription_rel r, pg_subscription s + WHERE r.srsubstate IN ('f', 's', 'r') AND s.oid = r.srsubid AND s.subfailover + ) UNION ( + SELECT pg_replication_origin_progress(CONCAT('pg_', s.oid), false) AS remote_lsn + FROM pg_subscription s + WHERE s.subfailover + )); What about adding a join to pg_replication_origin to get rid of the "hardcoded" format "CONCAT('pg_', r.srsubid, '_', r.srrelid)" and "CONCAT('pg_', s.oid)"? Idea behind 2 === and 3 === is to have the queries as generic as possible and not rely on a hardcoded format (that would be more difficult to maintain should those formats change in the future). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
On Mon, Apr 29, 2024 at 5:28 PM Zhijie Hou (Fujitsu) wrote: > > On Monday, April 29, 2024 5:11 PM shveta malik wrote: > > > > On Mon, Apr 29, 2024 at 11:38 AM shveta malik > > wrote: > > > > > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu) > > > wrote: > > > > > > > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot > > wrote: > > > > > > > > > > Hi, > > > > > > > > > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote: > > > > > > Hi, > > > > > > > > > > > > Since the standby_slot_names patch has been committed, I am > > > > > > attaching the last doc patch for review. > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > 1 === > > > > > > > > > > + continue subscribing to publications now on the new primary > > > > > + server > > > > > without > > > > > + any data loss. > > > > > > > > > > I think "without any data loss" should be re-worded in this > > > > > context. Data loss in the sense "data committed on the primary and > > > > > not visible on the subscriber in case of failover" can still occurs > > > > > (in case > > synchronous replication is not used). > > > > > > > > > > 2 === > > > > > > > > > > + If the result (failover_ready) of both above > > > > > steps is > > > > > + true, existing subscriptions will be able to continue without data > > loss. > > > > > + > > > > > > > > > > I don't think that's true if synchronous replication is not used. > > > > > Say, > > > > > > > > > > - synchronous replication is not used > > > > > - primary is not able to reach the standby anymore and > > > > > standby_slot_names is set > > > > > - new data is inserted into the primary > > > > > - then not replicated to subscriber (due to standby_slot_names) > > > > > > > > > > Then I think the both above steps will return true but data would > > > > > be lost in case of failover. > > > > > > > > Thanks for the comments, attach the new version patch which reworded > > > > the above places. > > > > > > Thanks for the patch. > > > > > > Few comments: > > > > > > 1) Tested the steps, one of the queries still refers to > > > 'conflict_reason'. I think it should refer 'conflicting'. > > Thanks for catching this. Fixed. > > > > > > > 2) Will it be good to mention that in case of planned promotion, it is > > > recommended to run pg_sync_replication_slots() as last sync attempt > > > before we run failvoer-ready validation steps? This can be mentioned > > > in high-availaibility.sgml of current patch > > > > I recall now that with the latest fix, we cannot run > > pg_sync_replication_slots() unless we disable the slot-sync worker. > > Considering that, I think it will be too many steps just to run the SQL > > function at > > the end without much value added. Thus we can skip this point, we can rely > > on > > slot sync worker completely. > > Agreed. I didn't change this. > > Here is the V3 doc patch. Thanks for the patch. It will be good if 1a can produce quoted slot-names list as output, which can be used directly in step 1b's query; otherwise, it is little inconvenient to give input to 1b if the number of slots are huge. User needs to manually quote each slot-name. Other than this, the patch looks good to me. thanks Shveta
RE: Synchronizing slots from primary to standby
On Monday, April 29, 2024 5:11 PM shveta malik wrote: > > On Mon, Apr 29, 2024 at 11:38 AM shveta malik > wrote: > > > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu) > > wrote: > > > > > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot > wrote: > > > > > > > > Hi, > > > > > > > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote: > > > > > Hi, > > > > > > > > > > Since the standby_slot_names patch has been committed, I am > > > > > attaching the last doc patch for review. > > > > > > > > > > > > > Thanks! > > > > > > > > 1 === > > > > > > > > + continue subscribing to publications now on the new primary > > > > + server > > > > without > > > > + any data loss. > > > > > > > > I think "without any data loss" should be re-worded in this > > > > context. Data loss in the sense "data committed on the primary and > > > > not visible on the subscriber in case of failover" can still occurs (in > > > > case > synchronous replication is not used). > > > > > > > > 2 === > > > > > > > > + If the result (failover_ready) of both above > > > > steps is > > > > + true, existing subscriptions will be able to continue without data > loss. > > > > + > > > > > > > > I don't think that's true if synchronous replication is not used. > > > > Say, > > > > > > > > - synchronous replication is not used > > > > - primary is not able to reach the standby anymore and > > > > standby_slot_names is set > > > > - new data is inserted into the primary > > > > - then not replicated to subscriber (due to standby_slot_names) > > > > > > > > Then I think the both above steps will return true but data would > > > > be lost in case of failover. > > > > > > Thanks for the comments, attach the new version patch which reworded > > > the above places. > > > > Thanks for the patch. > > > > Few comments: > > > > 1) Tested the steps, one of the queries still refers to > > 'conflict_reason'. I think it should refer 'conflicting'. Thanks for catching this. Fixed. > > > > 2) Will it be good to mention that in case of planned promotion, it is > > recommended to run pg_sync_replication_slots() as last sync attempt > > before we run failvoer-ready validation steps? This can be mentioned > > in high-availaibility.sgml of current patch > > I recall now that with the latest fix, we cannot run > pg_sync_replication_slots() unless we disable the slot-sync worker. > Considering that, I think it will be too many steps just to run the SQL > function at > the end without much value added. Thus we can skip this point, we can rely on > slot sync worker completely. Agreed. I didn't change this. Here is the V3 doc patch. Best Regards, Hou zj v3-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch Description: v3-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch
Re: Synchronizing slots from primary to standby
On Mon, Apr 29, 2024 at 11:38 AM shveta malik wrote: > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu) > wrote: > > > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot > > wrote: > > > > > > Hi, > > > > > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote: > > > > Hi, > > > > > > > > Since the standby_slot_names patch has been committed, I am attaching > > > > the last doc patch for review. > > > > > > > > > > Thanks! > > > > > > 1 === > > > > > > + continue subscribing to publications now on the new primary server > > > without > > > + any data loss. > > > > > > I think "without any data loss" should be re-worded in this context. Data > > > loss in > > > the sense "data committed on the primary and not visible on the > > > subscriber in > > > case of failover" can still occurs (in case synchronous replication is > > > not used). > > > > > > 2 === > > > > > > + If the result (failover_ready) of both above steps > > > is > > > + true, existing subscriptions will be able to continue without data > > > loss. > > > + > > > > > > I don't think that's true if synchronous replication is not used. Say, > > > > > > - synchronous replication is not used > > > - primary is not able to reach the standby anymore and standby_slot_names > > > is > > > set > > > - new data is inserted into the primary > > > - then not replicated to subscriber (due to standby_slot_names) > > > > > > Then I think the both above steps will return true but data would be lost > > > in case > > > of failover. > > > > Thanks for the comments, attach the new version patch which reworded the > > above places. > > Thanks for the patch. > > Few comments: > > 1) Tested the steps, one of the queries still refers to > 'conflict_reason'. I think it should refer 'conflicting'. > > 2) Will it be good to mention that in case of planned promotion, it is > recommended to run pg_sync_replication_slots() as last sync attempt > before we run failvoer-ready validation steps? This can be mentioned > in high-availaibility.sgml of current patch I recall now that with the latest fix, we cannot run pg_sync_replication_slots() unless we disable the slot-sync worker. Considering that, I think it will be too many steps just to run the SQL function at the end without much value added. Thus we can skip this point, we can rely on slot sync worker completely. thanks Shveta
Re: Synchronizing slots from primary to standby
On Mon, Apr 29, 2024 at 11:38 AM shveta malik wrote: > > On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu) > wrote: > > > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot > > wrote: > > > > > > Hi, > > > > > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote: > > > > Hi, > > > > > > > > Since the standby_slot_names patch has been committed, I am attaching > > > > the last doc patch for review. > > > > > > > > > > Thanks! > > > > > > 1 === > > > > > > + continue subscribing to publications now on the new primary server > > > without > > > + any data loss. > > > > > > I think "without any data loss" should be re-worded in this context. Data > > > loss in > > > the sense "data committed on the primary and not visible on the > > > subscriber in > > > case of failover" can still occurs (in case synchronous replication is > > > not used). > > > > > > 2 === > > > > > > + If the result (failover_ready) of both above steps > > > is > > > + true, existing subscriptions will be able to continue without data > > > loss. > > > + > > > > > > I don't think that's true if synchronous replication is not used. Say, > > > > > > - synchronous replication is not used > > > - primary is not able to reach the standby anymore and standby_slot_names > > > is > > > set > > > - new data is inserted into the primary > > > - then not replicated to subscriber (due to standby_slot_names) > > > > > > Then I think the both above steps will return true but data would be lost > > > in case > > > of failover. > > > > Thanks for the comments, attach the new version patch which reworded the > > above places. > > Thanks for the patch. > > Few comments: > > 1) Tested the steps, one of the queries still refers to > 'conflict_reason'. I think it should refer 'conflicting'. > > 2) Will it be good to mention that in case of planned promotion, it is > recommended to run pg_sync_replication_slots() as last sync attempt > before we run failvoer-ready validation steps? This can be mentioned > in high-availaibility.sgml of current patch I recall now that with the latest fix, we cannot run pg_sync_replication_slots() unless we disable the slot-sync worker. Considering that, I think it will be too many steps just to run the SQL function at the end without much value added. Thus we can skip this point, we can rely on slot sync worker completely. thanks Shveta
Re: Synchronizing slots from primary to standby
On Mon, Apr 29, 2024 at 10:57 AM Zhijie Hou (Fujitsu) wrote: > > On Friday, March 15, 2024 10:45 PM Bertrand Drouvot > wrote: > > > > Hi, > > > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote: > > > Hi, > > > > > > Since the standby_slot_names patch has been committed, I am attaching > > > the last doc patch for review. > > > > > > > Thanks! > > > > 1 === > > > > + continue subscribing to publications now on the new primary server > > without > > + any data loss. > > > > I think "without any data loss" should be re-worded in this context. Data > > loss in > > the sense "data committed on the primary and not visible on the subscriber > > in > > case of failover" can still occurs (in case synchronous replication is not > > used). > > > > 2 === > > > > + If the result (failover_ready) of both above steps is > > + true, existing subscriptions will be able to continue without data loss. > > + > > > > I don't think that's true if synchronous replication is not used. Say, > > > > - synchronous replication is not used > > - primary is not able to reach the standby anymore and standby_slot_names is > > set > > - new data is inserted into the primary > > - then not replicated to subscriber (due to standby_slot_names) > > > > Then I think the both above steps will return true but data would be lost > > in case > > of failover. > > Thanks for the comments, attach the new version patch which reworded the > above places. Thanks for the patch. Few comments: 1) Tested the steps, one of the queries still refers to 'conflict_reason'. I think it should refer 'conflicting'. 2) Will it be good to mention that in case of planned promotion, it is recommended to run pg_sync_replication_slots() as last sync attempt before we run failvoer-ready validation steps? This can be mentioned in high-availaibility.sgml of current patch thanks Shveta
RE: Synchronizing slots from primary to standby
On Friday, March 15, 2024 10:45 PM Bertrand Drouvot wrote: > > Hi, > > On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote: > > Hi, > > > > Since the standby_slot_names patch has been committed, I am attaching > > the last doc patch for review. > > > > Thanks! > > 1 === > > + continue subscribing to publications now on the new primary server > without > + any data loss. > > I think "without any data loss" should be re-worded in this context. Data > loss in > the sense "data committed on the primary and not visible on the subscriber in > case of failover" can still occurs (in case synchronous replication is not > used). > > 2 === > > + If the result (failover_ready) of both above steps is > + true, existing subscriptions will be able to continue without data loss. > + > > I don't think that's true if synchronous replication is not used. Say, > > - synchronous replication is not used > - primary is not able to reach the standby anymore and standby_slot_names is > set > - new data is inserted into the primary > - then not replicated to subscriber (due to standby_slot_names) > > Then I think the both above steps will return true but data would be lost in > case > of failover. Thanks for the comments, attach the new version patch which reworded the above places. Best Regards, Hou zj v2-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch Description: v2-0001-Document-the-steps-to-check-if-the-standby-is-rea.patch
RE: Synchronizing slots from primary to standby
On Friday, April 12, 2024 11:31 AM Amit Kapila wrote: > > On Thu, Apr 11, 2024 at 5:04 PM Zhijie Hou (Fujitsu) > wrote: > > > > On Thursday, April 11, 2024 12:11 PM Amit Kapila > wrote: > > > > > > > > 2. > > > - if (remote_slot->restart_lsn < slot->data.restart_lsn) > > > + if (remote_slot->confirmed_lsn < slot->data.confirmed_flush) > > > elog(ERROR, > > > "cannot synchronize local slot \"%s\" LSN(%X/%X)" > > > > > > Can we be more specific in this message? How about splitting it into > > > error_message as "cannot synchronize local slot \"%s\"" and then > > > errdetail as "Local slot's start streaming location LSN(%X/%X) is > > > ahead of remote slot's LSN(%X/%X)"? > > > > Your version looks better. Since the above two messages all have > > errdetail, I used the style of ereport(ERROR, errmsg_internal(), > > errdetail_internal()... in the patch which is equal to the elog(ERROR but > > has an > additional detail message. > > > > makes sense. > > > Here is V5 patch set. > > > > I think we should move the check to not advance slot when one of > remote_slot's restart_lsn or catalog_xmin is lesser than the local slot inside > update_local_synced_slot() as we want to prevent updating slot in those cases > even during slot synchronization. Agreed. Here is the V6 patch which addressed this. I have merged the two patches into one. Best Regards, Hou zj v6-0001-Fix-the-handling-of-LSN-and-xmin-in-slot-sync.patch Description: v6-0001-Fix-the-handling-of-LSN-and-xmin-in-slot-sync.patch
Re: Synchronizing slots from primary to standby
On Thu, Apr 11, 2024 at 5:04 PM Zhijie Hou (Fujitsu) wrote: > > On Thursday, April 11, 2024 12:11 PM Amit Kapila > wrote: > > > > > 2. > > - if (remote_slot->restart_lsn < slot->data.restart_lsn) > > + if (remote_slot->confirmed_lsn < slot->data.confirmed_flush) > > elog(ERROR, > > "cannot synchronize local slot \"%s\" LSN(%X/%X)" > > > > Can we be more specific in this message? How about splitting it into > > error_message as "cannot synchronize local slot \"%s\"" and then errdetail > > as > > "Local slot's start streaming location LSN(%X/%X) is ahead of remote slot's > > LSN(%X/%X)"? > > Your version looks better. Since the above two messages all have errdetail, I > used the style of ereport(ERROR, errmsg_internal(), errdetail_internal()... in > the patch which is equal to the elog(ERROR but has an additional detail > message. > makes sense. > Here is V5 patch set. > I think we should move the check to not advance slot when one of remote_slot's restart_lsn or catalog_xmin is lesser than the local slot inside update_local_synced_slot() as we want to prevent updating slot in those cases even during slot synchronization. -- With Regards, Amit Kapila.
RE: Synchronizing slots from primary to standby
On Thursday, April 11, 2024 12:11 PM Amit Kapila wrote: > > On Wed, Apr 10, 2024 at 5:28 PM Zhijie Hou (Fujitsu) > wrote: > > > > On Thursday, April 4, 2024 5:37 PM Amit Kapila > wrote: > > > > > > BTW, while thinking on this one, I > > > noticed that in the function LogicalConfirmReceivedLocation(), we > > > first update the disk copy, see comment [1] and then in-memory > > > whereas the same is not true in > > > update_local_synced_slot() for the case when snapshot exists. Now, > > > do we have the same risk here in case of standby? Because I think we > > > will use these xmins while sending the feedback message (in > XLogWalRcvSendHSFeedback()). > > > > > > * We have to write the changed xmin to disk *before* we change > > > * the in-memory value, otherwise after a crash we wouldn't know > > > * that some catalog tuples might have been removed already. > > > > Yes, I think we have the risk on the standby, I can reproduce the case > > that if the server crashes after updating the in-memory value and > > before saving them to disk, the synced slot could be invalidated after > > restarting from crash, because the necessary rows have been removed on > > the primary. The steps can be found in [1]. > > > > I think we'd better fix the order in update_local_synced_slot() as > > well. I tried to make the fix in 0002, 0001 is Shveta's patch to fix > > another issue in this thread. Since they are touching the same function, so > attach them together for review. > > > > Few comments: > === > 1. > + > + /* Sanity check */ > + if (slot->data.confirmed_flush != remote_slot->confirmed_lsn) > + ereport(LOG, errmsg("synchronized confirmed_flush for slot \"%s\" > + differs from > remote slot", > +remote_slot->name), > > Is there a reason to use elevel as LOG instead of ERROR? I think it should be > elog(ERROR, .. as this is an unexpected case. Agreed. > > 2. > - if (remote_slot->restart_lsn < slot->data.restart_lsn) > + if (remote_slot->confirmed_lsn < slot->data.confirmed_flush) > elog(ERROR, > "cannot synchronize local slot \"%s\" LSN(%X/%X)" > > Can we be more specific in this message? How about splitting it into > error_message as "cannot synchronize local slot \"%s\"" and then errdetail as > "Local slot's start streaming location LSN(%X/%X) is ahead of remote slot's > LSN(%X/%X)"? Your version looks better. Since the above two messages all have errdetail, I used the style of ereport(ERROR, errmsg_internal(), errdetail_internal()... in the patch which is equal to the elog(ERROR but has an additional detail message. Here is V5 patch set. Best Regards, Hou zj v5-0002-write-the-changed-xmin-to-disk-before-computing-g.patch Description: v5-0002-write-the-changed-xmin-to-disk-before-computing-g.patch v5-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch Description: v5-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch
Re: Synchronizing slots from primary to standby
On Wed, Apr 10, 2024 at 5:28 PM Zhijie Hou (Fujitsu) wrote: > > On Thursday, April 4, 2024 5:37 PM Amit Kapila > wrote: > > > > BTW, while thinking on this one, I > > noticed that in the function LogicalConfirmReceivedLocation(), we first > > update > > the disk copy, see comment [1] and then in-memory whereas the same is not > > true in > > update_local_synced_slot() for the case when snapshot exists. Now, do we > > have > > the same risk here in case of standby? Because I think we will use these > > xmins > > while sending the feedback message (in XLogWalRcvSendHSFeedback()). > > > > * We have to write the changed xmin to disk *before* we change > > * the in-memory value, otherwise after a crash we wouldn't know > > * that some catalog tuples might have been removed already. > > Yes, I think we have the risk on the standby, I can reproduce the case that if > the server crashes after updating the in-memory value and before saving them > to > disk, the synced slot could be invalidated after restarting from crash, > because > the necessary rows have been removed on the primary. The steps can be found in > [1]. > > I think we'd better fix the order in update_local_synced_slot() as well. I > tried to make the fix in 0002, 0001 is Shveta's patch to fix another issue in > this thread. Since > they are touching the same function, so attach them together for review. > Few comments: === 1. + + /* Sanity check */ + if (slot->data.confirmed_flush != remote_slot->confirmed_lsn) + ereport(LOG, + errmsg("synchronized confirmed_flush for slot \"%s\" differs from remote slot", +remote_slot->name), Is there a reason to use elevel as LOG instead of ERROR? I think it should be elog(ERROR, .. as this is an unexpected case. 2. - if (remote_slot->restart_lsn < slot->data.restart_lsn) + if (remote_slot->confirmed_lsn < slot->data.confirmed_flush) elog(ERROR, "cannot synchronize local slot \"%s\" LSN(%X/%X)" Can we be more specific in this message? How about splitting it into error_message as "cannot synchronize local slot \"%s\"" and then errdetail as "Local slot's start streaming location LSN(%X/%X) is ahead of remote slot's LSN(%X/%X)"? -- With Regards, Amit Kapila.
RE: Synchronizing slots from primary to standby
On Thursday, April 4, 2024 5:37 PM Amit Kapila wrote: > > BTW, while thinking on this one, I > noticed that in the function LogicalConfirmReceivedLocation(), we first update > the disk copy, see comment [1] and then in-memory whereas the same is not > true in > update_local_synced_slot() for the case when snapshot exists. Now, do we have > the same risk here in case of standby? Because I think we will use these xmins > while sending the feedback message (in XLogWalRcvSendHSFeedback()). > > * We have to write the changed xmin to disk *before* we change > * the in-memory value, otherwise after a crash we wouldn't know > * that some catalog tuples might have been removed already. Yes, I think we have the risk on the standby, I can reproduce the case that if the server crashes after updating the in-memory value and before saving them to disk, the synced slot could be invalidated after restarting from crash, because the necessary rows have been removed on the primary. The steps can be found in [1]. I think we'd better fix the order in update_local_synced_slot() as well. I tried to make the fix in 0002, 0001 is Shveta's patch to fix another issue in this thread. Since they are touching the same function, so attach them together for review. [1] -- Primary: SELECT 'init' FROM pg_create_logical_replication_slot('logicalslot', 'test_decoding', false, false, true); -- Standby: SELECT 'init' FROM pg_create_logical_replication_slot('standbylogicalslot', 'test_decoding', false, false, false); SELECT pg_sync_replication_slots(); -- Primary: CREATE TABLE test (a int); INSERT INTO test VALUES(1); DROP TABLE test; SELECT txid_current(); SELECT txid_current(); SELECT txid_current(); SELECT pg_log_standby_snapshot(); SELECT pg_replication_slot_advance('logicalslot', pg_current_wal_lsn()); -- Standby: - wait for standby to replay all the changes on the primary. - this is to serialize snapshots. SELECT pg_replication_slot_advance('standbylogicalslot', pg_last_wal_replay_lsn()); - Use gdb to stop at the place after calling ReplicationSlotsComputexx() functions and before calling ReplicationSlotSave(). SELECT pg_sync_replication_slots(); -- Primary: - First, wait for the primary slot(the physical slot)'s catalog xmin to be updated to the same as the failover slot. VACUUM FULL; - Wait for VACUMM FULL to be replayed on standby. -- Standby: - For the process which is blocked by gdb, let the process crash (elog(PANIC, ...)). After restarting the standby from crash, we can see the synced slot is invalidated. LOG: invalidating obsolete replication slot "logicalslot" DETAIL: The slot conflicted with xid horizon 741. CONTEXT: WAL redo at 0/3059B90 for Heap2/PRUNE_ON_ACCESS: snapshotConflictHorizon: 741, isCatalogRel: T, nplans: 0, nredirected: 0, ndead: 7, nunused: 0, dead: [22, 23, 24, 25, 26, 27, 28]; blkref #0: rel 1663/5/1249, blk 16 Best Regards, Hou zj v4-0002-write-the-changed-xmin-to-disk-before-computing-g.patch Description: v4-0002-write-the-changed-xmin-to-disk-before-computing-g.patch v4-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch Description: v4-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch
RE: Synchronizing slots from primary to standby
On Thursday, April 4, 2024 4:25 PM Masahiko Sawada wrote: Hi, > On Wed, Apr 3, 2024 at 7:06 PM Amit Kapila > wrote: > > > > On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila > wrote: > > > > > > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy > > > wrote: > > > > > > > I quickly looked at v8, and have a nit, rest all looks good. > > > > > > > > +if (DecodingContextReady(ctx) && > found_consistent_snapshot) > > > > +*found_consistent_snapshot = true; > > > > > > > > Can the found_consistent_snapshot be checked first to help avoid > > > > the function call DecodingContextReady() for > > > > pg_replication_slot_advance callers? > > > > > > > > > > Okay, changed. Additionally, I have updated the comments and commit > > > message. I'll push this patch after some more testing. > > > > > > > Pushed! > > While testing this change, I realized that it could happen that the server > logs > are flooded with the following logical decoding logs that are written every > 200 > ms: Thanks for reporting! > > 2024-04-04 16:15:19.270 JST [3838739] LOG: starting logical decoding for slot > "test_sub" > 2024-04-04 16:15:19.270 JST [3838739] DETAIL: Streaming transactions > committing after 0/50006F48, reading WAL from 0/50006F10. > 2024-04-04 16:15:19.270 JST [3838739] LOG: logical decoding found > consistent point at 0/50006F10 > 2024-04-04 16:15:19.270 JST [3838739] DETAIL: There are no running > transactions. > 2024-04-04 16:15:19.477 JST [3838739] LOG: starting logical decoding for slot > "test_sub" > 2024-04-04 16:15:19.477 JST [3838739] DETAIL: Streaming transactions > committing after 0/50006F48, reading WAL from 0/50006F10. > 2024-04-04 16:15:19.477 JST [3838739] LOG: logical decoding found > consistent point at 0/50006F10 > 2024-04-04 16:15:19.477 JST [3838739] DETAIL: There are no running > transactions. > > For example, I could reproduce it with the following steps: > > 1. create the primary and start. > 2. run "pgbench -i -s 100" on the primary. > 3. run pg_basebackup to create the standby. > 4. configure slotsync setup on the standby and start. > 5. create a publication for all tables on the primary. > 6. create the subscriber and start. > 7. run "pgbench -i -Idtpf" on the subscriber. > 8. create a subscription on the subscriber (initial data copy will start). > > The logical decoding logs were written every 200 ms during the initial data > synchronization. > > Looking at the new changes for update_local_synced_slot(): ... > We call LogicalSlotAdvanceAndCheckSnapState() if one of confirmed_lsn, > restart_lsn, and catalog_xmin is different between the remote slot and the > local > slot. In my test case, during the initial sync performing, only catalog_xmin > was > different and there was no serialized snapshot at restart_lsn, and the > slotsync > worker called LogicalSlotAdvanceAndCheckSnapState(). However no slot > properties were changed even after the function and it set slot_updated = > true. > So it starts the next slot synchronization after 200ms. I was trying to reproduce this and check why the catalog_xmin is different among synced slot and remote slot, but I was not able to reproduce the case where there are lots of logical decoding logs. The script I used is attached. Would it be possible for you to share the script you used to reproduce this issue? Alternatively, could you please share the log files from both the primary and standby servers after reproducing the problem (it would be greatly helpful if you could set the log level to DEBUG2). Best Regards, Hou zj test.sh Description: test.sh
Re: Synchronizing slots from primary to standby
On Mon, Apr 8, 2024 at 7:01 PM Zhijie Hou (Fujitsu) wrote: > > Thanks for pushing. > > I checked the BF status, and noticed one BF failure, which I think is related > to > a miss in the test code. > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-04-08%2012%3A04%3A27 > > From the following log, I can see the sync failed because the standby is > lagging behind of the failover slot. > > - > # No postmaster PID for node "cascading_standby" > error running SQL: 'psql::1: ERROR: skipping slot synchronization as > the received slot sync LSN 0/4000148 for slot "snap_test_slot" is ahead of > the standby position 0/4000114' > while running 'psql -XAtq -d port=50074 host=/tmp/t4HQFlrDmI > dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'SELECT > pg_sync_replication_slots();' at > /home/bf/bf-build/adder/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm > line 2042. > # Postmaster PID for node "publisher" is 3715298 > - > > I think it's because we missed to call wait_for_replay_catchup before syncing > slots. > > - > $primary->safe_psql('postgres', > "SELECT pg_create_logical_replication_slot('snap_test_slot', > 'test_decoding', false, false, true);" > ); > # ? missed to wait here > $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();"); > - > > While testing, I noticed another place where we were calling > wait_for_replay_catchup before doing pg_replication_slot_advance, which also > has > a small possibility to cause the failover slot to be ahead of the standby if > some logs are written in between these two steps. So, I adjusted them > together. > > Here is a small patch to improve the test. > LGTM. I'll push this tomorrow morning unless there are any more comments or suggestions. -- With Regards, Amit Kapila.
Re: Synchronizing slots from primary to standby
On Mon, Apr 8, 2024 at 9:49 PM Andres Freund wrote: > > On 2024-04-08 16:01:41 +0530, Amit Kapila wrote: > > Pushed. > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-04-08%2012%3A04%3A27 > > This unfortunately is a commit after > Right, and thanks for the report. Hou-San has analyzed and shared the patch [1] for this yesterday. I'll review it today. [1] - https://www.postgresql.org/message-id/OS0PR01MB571665359F2F5DCD3ADABC9F94002%40OS0PR01MB5716.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
Re: Synchronizing slots from primary to standby
Hi, On 2024-04-08 16:01:41 +0530, Amit Kapila wrote: > Pushed. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-04-08%2012%3A04%3A27 This unfortunately is a commit after commit 6f3d8d5e7cc Author: Amit Kapila Date: 2024-04-08 13:21:55 +0530 Fix the intermittent buildfarm failures in 040_standby_failover_slots_sync. Greetings, Andres
RE: Synchronizing slots from primary to standby
On Monday, April 8, 2024 6:32 PM Amit Kapila wrote: > > On Mon, Apr 8, 2024 at 12:19 PM Zhijie Hou (Fujitsu) > wrote: > > > > On Saturday, April 6, 2024 12:43 PM Amit Kapila > wrote: > > > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot > > > wrote: > > > > > > Yeah, that could be the first step. We can probably add an injection > > > point to control the bgwrite behavior and then add tests involving > > > walsender performing the decoding. But I think it is important to > > > have sufficient tests in this area as I see they are quite helpful in > > > uncovering > the issues. > > > > Here is the patch to drop the subscription in the beginning so that > > the restart_lsn of the lsub1_slot won't be advanced due to concurrent > > xl_running_xacts from bgwriter. The subscription will be re-created > > after all the slots are sync-ready. I think maybe we can use this to > > stabilize the test as a first step and then think about how to make > > use of injection point to add more tests if it's worth it. > > > > Pushed. Thanks for pushing. I checked the BF status, and noticed one BF failure, which I think is related to a miss in the test code. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-04-08%2012%3A04%3A27 From the following log, I can see the sync failed because the standby is lagging behind of the failover slot. - # No postmaster PID for node "cascading_standby" error running SQL: 'psql::1: ERROR: skipping slot synchronization as the received slot sync LSN 0/4000148 for slot "snap_test_slot" is ahead of the standby position 0/4000114' while running 'psql -XAtq -d port=50074 host=/tmp/t4HQFlrDmI dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'SELECT pg_sync_replication_slots();' at /home/bf/bf-build/adder/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm line 2042. # Postmaster PID for node "publisher" is 3715298 - I think it's because we missed to call wait_for_replay_catchup before syncing slots. - $primary->safe_psql('postgres', "SELECT pg_create_logical_replication_slot('snap_test_slot', 'test_decoding', false, false, true);" ); # ? missed to wait here $standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();"); - While testing, I noticed another place where we were calling wait_for_replay_catchup before doing pg_replication_slot_advance, which also has a small possibility to cause the failover slot to be ahead of the standby if some logs are written in between these two steps. So, I adjusted them together. Here is a small patch to improve the test. Best Regards, Hou zj 0001-Ensure-the-standby-is-not-lagging-behind-the-failove.patch Description: 0001-Ensure-the-standby-is-not-lagging-behind-the-failove.patch
Re: Synchronizing slots from primary to standby
On Mon, Apr 8, 2024 at 12:19 PM Zhijie Hou (Fujitsu) wrote: > > On Saturday, April 6, 2024 12:43 PM Amit Kapila > wrote: > > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot > > wrote: > > > > Yeah, that could be the first step. We can probably add an injection point > > to > > control the bgwrite behavior and then add tests involving walsender > > performing the decoding. But I think it is important to have sufficient > > tests in > > this area as I see they are quite helpful in uncovering the issues. > > Here is the patch to drop the subscription in the beginning so that the > restart_lsn of the lsub1_slot won't be advanced due to concurrent > xl_running_xacts from bgwriter. The subscription will be re-created after all > the slots are sync-ready. I think maybe we can use this to stabilize the test > as a first step and then think about how to make use of injection point to add > more tests if it's worth it. > Pushed. -- With Regards, Amit Kapila.
RE: Synchronizing slots from primary to standby
On Saturday, April 6, 2024 12:43 PM Amit Kapila wrote: > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot > wrote: > > > > On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote: > > > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila > wrote: > > > Thinking more on this, it doesn't seem related to > > > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't > > > change any locking or something like that which impacts write positions. > > > > Agree. > > > > > I think what has happened here is that running_xact record written > > > by the background writer [1] is not written to the kernel or disk > > > (see LogStandbySnapshot()), before pg_current_wal_lsn() checks the > > > current_lsn to be compared with replayed LSN. > > > > Agree, I think it's not visible through pg_current_wal_lsn() yet. > > > > Also I think that the DEBUG message in LogCurrentRunningXacts() > > > > " > > elog(DEBUG2, > > "snapshot of %d+%d running transaction ids (lsn %X/%X oldest > xid %u latest complete %u next xid %u)", > > CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt, > > LSN_FORMAT_ARGS(recptr), > > CurrRunningXacts->oldestRunningXid, > > CurrRunningXacts->latestCompletedXid, > > CurrRunningXacts->nextXid); " > > > > should be located after the XLogSetAsyncXactLSN() call. Indeed, the > > new LSN is visible after the spinlock (XLogCtl->info_lck) in > > XLogSetAsyncXactLSN() is released, > > > > I think the new LSN can be visible only when the corresponding WAL is written > by XLogWrite(). I don't know what in XLogSetAsyncXactLSN() can make it > visible. In your experiment below, isn't it possible that in the meantime WAL > writer has written that WAL due to which you are seeing an updated location? > > >see: > > > > \watch on Session 1 provides: > > > > pg_current_wal_lsn > > > > 0/87D110 > > (1 row) > > > > Until: > > > > Breakpoint 2, XLogSetAsyncXactLSN (asyncXactLSN=8900936) at > xlog.c:2579 > > 2579XLogRecPtr WriteRqstPtr = asyncXactLSN; > > (gdb) n > > 2581boolwakeup = false; > > (gdb) > > 2584SpinLockAcquire(&XLogCtl->info_lck); > > (gdb) > > 2585RefreshXLogWriteResult(LogwrtResult); > > (gdb) > > 2586sleeping = XLogCtl->WalWriterSleeping; > > (gdb) > > 2587prevAsyncXactLSN = XLogCtl->asyncXactLSN; > > (gdb) > > 2588if (XLogCtl->asyncXactLSN < asyncXactLSN) > > (gdb) > > 2589XLogCtl->asyncXactLSN = asyncXactLSN; > > (gdb) > > 2590SpinLockRelease(&XLogCtl->info_lck); > > (gdb) p p/x (uint32) XLogCtl->asyncXactLSN > > $1 = 0x87d148 > > > > Then session 1 provides: > > > > pg_current_wal_lsn > > > > 0/87D148 > > (1 row) > > > > So, when we see in the log: > > > > 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG: > > snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740 > > latest complete 739 next xid 740) > > 2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG: > statement: SELECT '0/360' <= replay_lsn AND state = 'streaming' > > > > It's indeed possible that the new LSN was not visible yet (spinlock > > not released?) before the query began (because we can not rely on the > > time the DEBUG message has been emitted). > > > > > Note that the reason why > > > walsender has picked the running_xact written by background writer > > > is because it has checked after pg_current_wal_lsn() query, see LOGs [2]. > > > I think we can probably try to reproduce manually via debugger. > > > > > > If this theory is correct > > > > It think it is. > > > > > then I think we will need to use injection points to control the > > > behavior of bgwriter or use the slots created via SQL API for > > > syncing in tests. > > > > > > Thoughts? > > > > I think that maybe as a first step we should move the "elog(DEBUG2," > > message as proposed above to help debugging (that could help to confirm > the above theory). > > > > I think I am missing how exactly moving DEBUG2 can confirm the above theory. > > > If the theory is proven then I'm not sure we need the extra complexity > > of injection point here, maybe just relying on the slots created via > > SQL API could be enough. > > > > Yeah, that could be the first step. We can probably add an injection point to > control the bgwrite behavior and then add tests involving walsender > performing the decoding. But I think it is important to have sufficient tests > in > this area as I see they are quite helpful in uncovering the issues. Here is the patch to drop the subscription in the beginning so that the restart_lsn of the lsub1_slot won't be advanced due to concurrent xl_running_xacts from bgwriter. The subscription will be re-created after all the slots are sync-ready. I think maybe we can use this to stabilize the test as a first step and then think about how to make use of
Re: Synchronizing slots from primary to standby
On Sun, Apr 7, 2024 at 3:06 AM Andres Freund wrote: > > On 2024-04-06 10:58:32 +0530, Amit Kapila wrote: > > On Sat, Apr 6, 2024 at 10:13 AM Amit Kapila wrote: > > > > > > > There are still a few pending issues to be fixed in this feature but > > otherwise, we have committed all the main patches, so I marked the CF > > entry corresponding to this work as committed. > > There are a a fair number of failures of 040_standby_failover_slots_sync in > the buildfarm. It'd be nice to get those fixed soon-ish. > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2024-04-06%2020%3A58%3A50 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2024-04-06%2015%3A18%3A08 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo&dt=2024-04-06%2010%3A13%3A58 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2024-04-05%2016%3A04%3A10 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo&dt=2024-04-05%2014%3A59%3A40 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-04-05%2014%3A59%3A07 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2024-04-05%2014%3A18%3A07 > > The symptoms are similar, but not entirely identical across all of them, I > think. > I have analyzed these failures and there are two different tests that are failing but the underlying reason is the same as being discussed with Bertrand. We are working on the fix. -- With Regards, Amit Kapila.
Re: Synchronizing slots from primary to standby
Hi, On 2024-04-06 10:58:32 +0530, Amit Kapila wrote: > On Sat, Apr 6, 2024 at 10:13 AM Amit Kapila wrote: > > > > There are still a few pending issues to be fixed in this feature but > otherwise, we have committed all the main patches, so I marked the CF > entry corresponding to this work as committed. There are a a fair number of failures of 040_standby_failover_slots_sync in the buildfarm. It'd be nice to get those fixed soon-ish. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=flaviventris&dt=2024-04-06%2020%3A58%3A50 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2024-04-06%2015%3A18%3A08 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo&dt=2024-04-06%2010%3A13%3A58 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2024-04-05%2016%3A04%3A10 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo&dt=2024-04-05%2014%3A59%3A40 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-04-05%2014%3A59%3A07 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2024-04-05%2014%3A18%3A07 The symptoms are similar, but not entirely identical across all of them, I think. I've also seen a bunch of failures of this test locally. Greetings, Andres Freund
Re: Synchronizing slots from primary to standby
Hi, On Sat, Apr 06, 2024 at 10:13:00AM +0530, Amit Kapila wrote: > On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot > wrote: > > I think the new LSN can be visible only when the corresponding WAL is > written by XLogWrite(). I don't know what in XLogSetAsyncXactLSN() can > make it visible. In your experiment below, isn't it possible that in > the meantime WAL writer has written that WAL due to which you are > seeing an updated location? What I did is: session 1: select pg_current_wal_lsn();\watch 1 session 2: select pg_backend_pid(); terminal 1: tail -f logfile | grep -i snap terminal 2 : gdb -p ) at standby.c:1346 1346{ (gdb) n 1350 Then next, next until the DEBUG message is emitted (confirmed in terminal 1). At this stage the DEBUG message shows the new LSN while session 1 still displays the previous LSN. Then once XLogSetAsyncXactLSN() is done in the gdb session (terminal 2) then session 1 displays the new LSN. This is reproducible as desired. With more debugging I can see that when the spinlock is released in XLogSetAsyncXactLSN() then XLogWrite() is doing its job and then session 1 does see the new value (that happens in this order, and as you said that's expected). My point is that while the DEBUG message is emitted session 1 still see the old LSN (until the new LSN is vsible). I think that we should emit the DEBUG message once session 1 can see the new value (If not, I think the timestamp of the DEBUG message can be missleading during debugging purpose). > I think I am missing how exactly moving DEBUG2 can confirm the above theory. I meant to say that instead of seeing: 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG: snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740 latest complete 739 next xid 740) 2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG: statement: SELECT '0/360' <= replay_lsn AND state = 'streaming' We would probably see something like: 2024-04-05 04:37:05. UTC [3866475][client backend][2/4:0] LOG: statement: SELECT '0/360' <= replay_lsn AND state = 'streaming' 2024-04-05 04:37:05.+xx UTC [3854278][background writer][:0] DEBUG: snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740 latest complete 739 next xid 740) And then it would be clear that the query has ran before the new LSN is visible. > > If the theory is proven then I'm not sure we need the extra complexity of > > injection point here, maybe just relying on the slots created via SQL API > > could > > be enough. > > > > Yeah, that could be the first step. We can probably add an injection > point to control the bgwrite behavior and then add tests involving > walsender performing the decoding. But I think it is important to have > sufficient tests in this area as I see they are quite helpful in > uncovering the issues. > Yeah agree. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
On Sat, Apr 6, 2024 at 10:13 AM Amit Kapila wrote: > There are still a few pending issues to be fixed in this feature but otherwise, we have committed all the main patches, so I marked the CF entry corresponding to this work as committed. -- With Regards, Amit Kapila.
Re: Synchronizing slots from primary to standby
On Fri, Apr 5, 2024 at 8:05 PM Bertrand Drouvot wrote: > > On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote: > > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila wrote: > > Thinking more on this, it doesn't seem related to > > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't change > > any locking or something like that which impacts write positions. > > Agree. > > > I think what has happened here is that running_xact record written by > > the background writer [1] is not written to the kernel or disk (see > > LogStandbySnapshot()), before pg_current_wal_lsn() checks the > > current_lsn to be compared with replayed LSN. > > Agree, I think it's not visible through pg_current_wal_lsn() yet. > > Also I think that the DEBUG message in LogCurrentRunningXacts() > > " > elog(DEBUG2, > "snapshot of %d+%d running transaction ids (lsn %X/%X oldest xid > %u latest complete %u next xid %u)", > CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt, > LSN_FORMAT_ARGS(recptr), > CurrRunningXacts->oldestRunningXid, > CurrRunningXacts->latestCompletedXid, > CurrRunningXacts->nextXid); > " > > should be located after the XLogSetAsyncXactLSN() call. Indeed, the new LSN is > visible after the spinlock (XLogCtl->info_lck) in XLogSetAsyncXactLSN() is > released, > I think the new LSN can be visible only when the corresponding WAL is written by XLogWrite(). I don't know what in XLogSetAsyncXactLSN() can make it visible. In your experiment below, isn't it possible that in the meantime WAL writer has written that WAL due to which you are seeing an updated location? >see: > > \watch on Session 1 provides: > > pg_current_wal_lsn > > 0/87D110 > (1 row) > > Until: > > Breakpoint 2, XLogSetAsyncXactLSN (asyncXactLSN=8900936) at xlog.c:2579 > 2579XLogRecPtr WriteRqstPtr = asyncXactLSN; > (gdb) n > 2581boolwakeup = false; > (gdb) > 2584SpinLockAcquire(&XLogCtl->info_lck); > (gdb) > 2585RefreshXLogWriteResult(LogwrtResult); > (gdb) > 2586sleeping = XLogCtl->WalWriterSleeping; > (gdb) > 2587prevAsyncXactLSN = XLogCtl->asyncXactLSN; > (gdb) > 2588if (XLogCtl->asyncXactLSN < asyncXactLSN) > (gdb) > 2589XLogCtl->asyncXactLSN = asyncXactLSN; > (gdb) > 2590SpinLockRelease(&XLogCtl->info_lck); > (gdb) p p/x (uint32) XLogCtl->asyncXactLSN > $1 = 0x87d148 > > Then session 1 provides: > > pg_current_wal_lsn > > 0/87D148 > (1 row) > > So, when we see in the log: > > 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG: snapshot > of 0+0 running transaction ids (lsn 0/398 oldest xid 740 latest complete > 739 next xid 740) > 2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG: statement: > SELECT '0/360' <= replay_lsn AND state = 'streaming' > > It's indeed possible that the new LSN was not visible yet (spinlock not > released?) > before the query began (because we can not rely on the time the DEBUG message > has > been emitted). > > > Note that the reason why > > walsender has picked the running_xact written by background writer is > > because it has checked after pg_current_wal_lsn() query, see LOGs [2]. > > I think we can probably try to reproduce manually via debugger. > > > > If this theory is correct > > It think it is. > > > then I think we will need to use injection > > points to control the behavior of bgwriter or use the slots created > > via SQL API for syncing in tests. > > > > Thoughts? > > I think that maybe as a first step we should move the "elog(DEBUG2," message > as > proposed above to help debugging (that could help to confirm the above > theory). > I think I am missing how exactly moving DEBUG2 can confirm the above theory. > If the theory is proven then I'm not sure we need the extra complexity of > injection point here, maybe just relying on the slots created via SQL API > could > be enough. > Yeah, that could be the first step. We can probably add an injection point to control the bgwrite behavior and then add tests involving walsender performing the decoding. But I think it is important to have sufficient tests in this area as I see they are quite helpful in uncovering the issues. -- With Regards, Amit Kapila.
Re: Synchronizing slots from primary to standby
Hi, On Fri, Apr 05, 2024 at 02:35:42PM +, Bertrand Drouvot wrote: > I think that maybe as a first step we should move the "elog(DEBUG2," message > as > proposed above to help debugging (that could help to confirm the above > theory). If you agree and think that makes sense, pleae find attached a tiny patch doing so. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From a414e81a4c5c88963b2412d1641f3de1262c15c0 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Fri, 5 Apr 2024 14:49:51 + Subject: [PATCH v1] Move DEBUG message in LogCurrentRunningXacts() Indeed the new LSN is visible to others session (say through pg_current_wal_lsn()) only after the spinlock (XLogCtl->info_lck) in XLogSetAsyncXactLSN() is released. So moving the DEBUG message after the XLogSetAsyncXactLSN() call seems more appropriate for debugging purpose. --- src/backend/storage/ipc/standby.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) 100.0% src/backend/storage/ipc/ diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c index 87b04e51b3..ba549acf50 100644 --- a/src/backend/storage/ipc/standby.c +++ b/src/backend/storage/ipc/standby.c @@ -1366,6 +1366,17 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts) recptr = XLogInsert(RM_STANDBY_ID, XLOG_RUNNING_XACTS); + /* + * Ensure running_xacts information is synced to disk not too far in the + * future. We don't want to stall anything though (i.e. use XLogFlush()), + * so we let the wal writer do it during normal operation. + * XLogSetAsyncXactLSN() conveniently will mark the LSN as to-be-synced + * and nudge the WALWriter into action if sleeping. Check + * XLogBackgroundFlush() for details why a record might not be flushed + * without it. + */ + XLogSetAsyncXactLSN(recptr); + if (CurrRunningXacts->subxid_overflow) elog(DEBUG2, "snapshot of %d running transactions overflowed (lsn %X/%X oldest xid %u latest complete %u next xid %u)", @@ -1383,17 +1394,6 @@ LogCurrentRunningXacts(RunningTransactions CurrRunningXacts) CurrRunningXacts->latestCompletedXid, CurrRunningXacts->nextXid); - /* - * Ensure running_xacts information is synced to disk not too far in the - * future. We don't want to stall anything though (i.e. use XLogFlush()), - * so we let the wal writer do it during normal operation. - * XLogSetAsyncXactLSN() conveniently will mark the LSN as to-be-synced - * and nudge the WALWriter into action if sleeping. Check - * XLogBackgroundFlush() for details why a record might not be flushed - * without it. - */ - XLogSetAsyncXactLSN(recptr); - return recptr; } -- 2.34.1
Re: Synchronizing slots from primary to standby
Hi, On Fri, Apr 05, 2024 at 06:23:10PM +0530, Amit Kapila wrote: > On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila wrote: > Thinking more on this, it doesn't seem related to > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit doesn't change > any locking or something like that which impacts write positions. Agree. > I think what has happened here is that running_xact record written by > the background writer [1] is not written to the kernel or disk (see > LogStandbySnapshot()), before pg_current_wal_lsn() checks the > current_lsn to be compared with replayed LSN. Agree, I think it's not visible through pg_current_wal_lsn() yet. Also I think that the DEBUG message in LogCurrentRunningXacts() " elog(DEBUG2, "snapshot of %d+%d running transaction ids (lsn %X/%X oldest xid %u latest complete %u next xid %u)", CurrRunningXacts->xcnt, CurrRunningXacts->subxcnt, LSN_FORMAT_ARGS(recptr), CurrRunningXacts->oldestRunningXid, CurrRunningXacts->latestCompletedXid, CurrRunningXacts->nextXid); " should be located after the XLogSetAsyncXactLSN() call. Indeed, the new LSN is visible after the spinlock (XLogCtl->info_lck) in XLogSetAsyncXactLSN() is released, see: \watch on Session 1 provides: pg_current_wal_lsn 0/87D110 (1 row) Until: Breakpoint 2, XLogSetAsyncXactLSN (asyncXactLSN=8900936) at xlog.c:2579 2579XLogRecPtr WriteRqstPtr = asyncXactLSN; (gdb) n 2581boolwakeup = false; (gdb) 2584SpinLockAcquire(&XLogCtl->info_lck); (gdb) 2585RefreshXLogWriteResult(LogwrtResult); (gdb) 2586sleeping = XLogCtl->WalWriterSleeping; (gdb) 2587prevAsyncXactLSN = XLogCtl->asyncXactLSN; (gdb) 2588if (XLogCtl->asyncXactLSN < asyncXactLSN) (gdb) 2589XLogCtl->asyncXactLSN = asyncXactLSN; (gdb) 2590SpinLockRelease(&XLogCtl->info_lck); (gdb) p p/x (uint32) XLogCtl->asyncXactLSN $1 = 0x87d148 Then session 1 provides: pg_current_wal_lsn 0/87D148 (1 row) So, when we see in the log: 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG: snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740 latest complete 739 next xid 740) 2024-04-05 04:37:05.197 UTC [3866475][client backend][2/4:0] LOG: statement: SELECT '0/360' <= replay_lsn AND state = 'streaming' It's indeed possible that the new LSN was not visible yet (spinlock not released?) before the query began (because we can not rely on the time the DEBUG message has been emitted). > Note that the reason why > walsender has picked the running_xact written by background writer is > because it has checked after pg_current_wal_lsn() query, see LOGs [2]. > I think we can probably try to reproduce manually via debugger. > > If this theory is correct It think it is. > then I think we will need to use injection > points to control the behavior of bgwriter or use the slots created > via SQL API for syncing in tests. > > Thoughts? I think that maybe as a first step we should move the "elog(DEBUG2," message as proposed above to help debugging (that could help to confirm the above theory). If the theory is proven then I'm not sure we need the extra complexity of injection point here, maybe just relying on the slots created via SQL API could be enough. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
On Fri, Apr 5, 2024 at 5:17 PM Amit Kapila wrote: > > On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote: > > > > There is an intermittent BF failure observed at [1] after this commit > > (2ec005b). > > > > Thanks for analyzing and providing the patch. I'll look into it. There > is another BF failure [1] which I have analyzed. The main reason for > failure is the following test: > > # Failed test 'logical slots have synced as true on standby' > # at > /home/bf/bf-build/serinus/HEAD/pgsql/src/test/recovery/t/040_standby_failover_slots_sync.pl > line 198. > # got: 'f' > # expected: 't' > > Here, we are expecting that the two logical slots (lsub1_slot, and > lsub2_slot), one created via subscription and another one via API > pg_create_logical_replication_slot() are synced. The standby LOGs > which are as follows show that the one created by API 'lsub2_slot' is > synced but the other one 'lsub1_slot': > > LOG for lsub1_slot: > > 2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0] DETAIL: > Streaming transactions committing after 0/0, reading WAL from > 0/360. > 2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0] > STATEMENT: SELECT pg_sync_replication_slots(); > 2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] DEBUG: > xmin required by slots: data 0, catalog 740 > 2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] LOG: > could not sync slot "lsub1_slot" > > LOG for lsub2_slot: > > 2024-04-05 04:37:08.518 UTC [3867682][client backend][0/2:0] DEBUG: > xmin required by slots: data 0, catalog 740 > 2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0] LOG: > newly created slot "lsub2_slot" is sync-ready now > 2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0] > STATEMENT: SELECT pg_sync_replication_slots(); > > We can see from the log of lsub1_slot that its restart_lsn is > 0/360 which means it will start reading from the WAL from that > location. Now, if we check the publisher log, we have running_xacts > record at that location. See following LOGs: > > 2024-04-05 04:36:57.830 UTC [3860839][client backend][8/2:0] LOG: > statement: SELECT pg_create_logical_replication_slot('lsub2_slot', > 'test_decoding', false, false, true); > 2024-04-05 04:36:58.718 UTC [3860839][client backend][8/2:0] DEBUG: > snapshot of 0+0 running transaction ids (lsn 0/360 oldest xid 740 > latest complete 739 next xid 740) > > > 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG: > snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740 > latest complete 739 next xid 740) > > The first running_xact record ends at 360 and the second one at > 398. So, the start location of the second running_xact is 360, > the same can be confirmed by the following LOG line of walsender: > > 2024-04-05 04:37:05.144 UTC [3857385][walsender][25/0:0] DEBUG: > serializing snapshot to pg_logical/snapshots/0-360.snap > > This shows that while processing running_xact at location 360, we > have serialized the snapshot. As there is no running transaction in > WAL at 360 so ideally we should have reached a consistent state > after processing that record on standby. But the reason standby didn't > process that LOG is that the confirmed_flush LSN is also at the same > location so the function LogicalSlotAdvanceAndCheckSnapState() exits > without reading the WAL at that location. Now, this can be confirmed > by the below walsender-specific LOG in publisher: > > 2024-04-05 04:36:59.155 UTC [3857385][walsender][25/0:0] DEBUG: write > 0/360 flush 0/360 apply 0/360 reply_time 2024-04-05 > 04:36:59.155181+00 > > We update the confirmed_flush location with the flush location after > receiving the above feedback. You can notice that we didn't receive > the feedback for the 398 location and hence both the > confirmed_flush and restart_lsn are at the same location 0/360. > Now, the test is waiting for the subscriber to send feedback of the > last WAL write location by > $primary->wait_for_catchup('regress_mysub1'); As noticed from the > publisher LOGs, the query we used for wait is: > > SELECT '0/360' <= replay_lsn AND state = 'streaming' > FROM pg_catalog.pg_stat_replication > WHERE application_name IN ('regress_mysub1', 'walreceiver') > > Here, instead of '0/360' it should have used ''0/398' which is > the last write location. This position we get via function > pg_current_wal_lsn()->GetXLogWriteRecPtr()->LogwrtResult.Write. And > this variable seems to be touched by commit > c9920a9068eac2e6c8fb34988d18c0b42b9bf811. Though unlikely could > c9920a9068eac2e6c8fb34988d18c0b42b9bf811 be a reason for failure? At > this stage, I am not sure so just sharing with others to see if what I > am saying sounds logical. I'll think more about this. > Thinking more on this, it doesn't seem related to c9920a9068eac2e6c8fb34988d18c0b42b9bf811 as that commit does
Re: Synchronizing slots from primary to standby
On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote: > > There is an intermittent BF failure observed at [1] after this commit > (2ec005b). > Thanks for analyzing and providing the patch. I'll look into it. There is another BF failure [1] which I have analyzed. The main reason for failure is the following test: # Failed test 'logical slots have synced as true on standby' # at /home/bf/bf-build/serinus/HEAD/pgsql/src/test/recovery/t/040_standby_failover_slots_sync.pl line 198. # got: 'f' # expected: 't' Here, we are expecting that the two logical slots (lsub1_slot, and lsub2_slot), one created via subscription and another one via API pg_create_logical_replication_slot() are synced. The standby LOGs which are as follows show that the one created by API 'lsub2_slot' is synced but the other one 'lsub1_slot': LOG for lsub1_slot: 2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0] DETAIL: Streaming transactions committing after 0/0, reading WAL from 0/360. 2024-04-05 04:37:07.421 UTC [3867682][client backend][0/2:0] STATEMENT: SELECT pg_sync_replication_slots(); 2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] DEBUG: xmin required by slots: data 0, catalog 740 2024-04-05 04:37:07.422 UTC [3867682][client backend][0/2:0] LOG: could not sync slot "lsub1_slot" LOG for lsub2_slot: 2024-04-05 04:37:08.518 UTC [3867682][client backend][0/2:0] DEBUG: xmin required by slots: data 0, catalog 740 2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0] LOG: newly created slot "lsub2_slot" is sync-ready now 2024-04-05 04:37:08.769 UTC [3867682][client backend][0/2:0] STATEMENT: SELECT pg_sync_replication_slots(); We can see from the log of lsub1_slot that its restart_lsn is 0/360 which means it will start reading from the WAL from that location. Now, if we check the publisher log, we have running_xacts record at that location. See following LOGs: 2024-04-05 04:36:57.830 UTC [3860839][client backend][8/2:0] LOG: statement: SELECT pg_create_logical_replication_slot('lsub2_slot', 'test_decoding', false, false, true); 2024-04-05 04:36:58.718 UTC [3860839][client backend][8/2:0] DEBUG: snapshot of 0+0 running transaction ids (lsn 0/360 oldest xid 740 latest complete 739 next xid 740) 2024-04-05 04:37:05.074 UTC [3854278][background writer][:0] DEBUG: snapshot of 0+0 running transaction ids (lsn 0/398 oldest xid 740 latest complete 739 next xid 740) The first running_xact record ends at 360 and the second one at 398. So, the start location of the second running_xact is 360, the same can be confirmed by the following LOG line of walsender: 2024-04-05 04:37:05.144 UTC [3857385][walsender][25/0:0] DEBUG: serializing snapshot to pg_logical/snapshots/0-360.snap This shows that while processing running_xact at location 360, we have serialized the snapshot. As there is no running transaction in WAL at 360 so ideally we should have reached a consistent state after processing that record on standby. But the reason standby didn't process that LOG is that the confirmed_flush LSN is also at the same location so the function LogicalSlotAdvanceAndCheckSnapState() exits without reading the WAL at that location. Now, this can be confirmed by the below walsender-specific LOG in publisher: 2024-04-05 04:36:59.155 UTC [3857385][walsender][25/0:0] DEBUG: write 0/360 flush 0/360 apply 0/360 reply_time 2024-04-05 04:36:59.155181+00 We update the confirmed_flush location with the flush location after receiving the above feedback. You can notice that we didn't receive the feedback for the 398 location and hence both the confirmed_flush and restart_lsn are at the same location 0/360. Now, the test is waiting for the subscriber to send feedback of the last WAL write location by $primary->wait_for_catchup('regress_mysub1'); As noticed from the publisher LOGs, the query we used for wait is: SELECT '0/360' <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name IN ('regress_mysub1', 'walreceiver') Here, instead of '0/360' it should have used ''0/398' which is the last write location. This position we get via function pg_current_wal_lsn()->GetXLogWriteRecPtr()->LogwrtResult.Write. And this variable seems to be touched by commit c9920a9068eac2e6c8fb34988d18c0b42b9bf811. Though unlikely could c9920a9068eac2e6c8fb34988d18c0b42b9bf811 be a reason for failure? At this stage, I am not sure so just sharing with others to see if what I am saying sounds logical. I'll think more about this. [1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-04-05%2004%3A34%3A27 -- With Regards, Amit Kapila.
Re: Synchronizing slots from primary to standby
On Fri, Apr 5, 2024 at 4:31 PM Bertrand Drouvot wrote: > > BTW, I just realized that the LSN I used in my example in the > LSN_FORMAT_ARGS() > are not the right ones. Noted. Thanks. Please find v3 with the comments addressed. thanks Shveta v3-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch Description: Binary data
Re: Synchronizing slots from primary to standby
Hi, On Fri, Apr 05, 2024 at 04:09:01PM +0530, shveta malik wrote: > On Fri, Apr 5, 2024 at 10:09 AM Bertrand Drouvot > wrote: > > > > What about something like? > > > > ereport(LOG, > > errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs > > from remote slot", > > remote_slot->name), > > errdetail("Remote slot has LSN %X/%X but local slot has LSN %X/%X.", > > LSN_FORMAT_ARGS(remote_slot->restart_lsn), > > LSN_FORMAT_ARGS(slot->data.restart_lsn)); > > > > Regards, > > +1. Better than earlier. I will update and post the patch. > Thanks! BTW, I just realized that the LSN I used in my example in the LSN_FORMAT_ARGS() are not the right ones. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
On Fri, Apr 5, 2024 at 10:09 AM Bertrand Drouvot wrote: > > What about something like? > > ereport(LOG, > errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs from > remote slot", > remote_slot->name), > errdetail("Remote slot has LSN %X/%X but local slot has LSN %X/%X.", > LSN_FORMAT_ARGS(remote_slot->restart_lsn), > LSN_FORMAT_ARGS(slot->data.restart_lsn)); > > Regards, +1. Better than earlier. I will update and post the patch. thanks Shveta
Re: Synchronizing slots from primary to standby
Hi, On Fri, Apr 05, 2024 at 09:43:35AM +0530, shveta malik wrote: > On Fri, Apr 5, 2024 at 9:22 AM Bertrand Drouvot > wrote: > > > > Hi, > > > > On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote: > > > On Thu, Apr 4, 2024 at 2:59 PM shveta malik > > > wrote: > > 2 === > > > > + if (slot->data.confirmed_flush != > > remote_slot->confirmed_lsn) > > + elog(LOG, > > +"could not synchronize local slot > > \"%s\" LSN(%X/%X)" > > +" to remote slot's LSN(%X/%X) ", > > +remote_slot->name, > > + > > LSN_FORMAT_ARGS(slot->data.confirmed_flush), > > + > > LSN_FORMAT_ARGS(remote_slot->confirmed_lsn)); > > > > I don't think that the message is correct here. Unless I am missing > > something > > there is nothing in the following code path that would prevent the slot to > > be > > sync during this cycle. > > This is a sanity check, I will put a comment to indicate the same. Thanks! > We > want to ensure if anything changes in future, we get correct logs to > indicate that. Right, understood that way. > If required, the LOG msg can be changed. Kindly suggest if you have > anything better in mind. > What about something like? ereport(LOG, errmsg("synchronized confirmed_flush_lsn for slot \"%s\" differs from remote slot", remote_slot->name), errdetail("Remote slot has LSN %X/%X but local slot has LSN %X/%X.", LSN_FORMAT_ARGS(remote_slot->restart_lsn), LSN_FORMAT_ARGS(slot->data.restart_lsn)); Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
On Fri, Apr 5, 2024 at 9:22 AM Bertrand Drouvot wrote: > > Hi, > > On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote: > > On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote: > > > > > > > > > Prior to commit 2ec005b, this check was okay, as we did not expect > > > restart_lsn of the synced slot to be ahead of remote since we were > > > directly copying the lsns. But now when we use 'advance' to do logical > > > decoding on standby, there is a possibility that restart lsn of the > > > synced slot is ahead of remote slot, if there are running txns records > > > found after reaching consistent-point while consuming WALs from > > > restart_lsn till confirmed_lsn. In such a case, slot-sync's advance > > > may end up serializing snapshots and setting restart_lsn to the > > > serialized snapshot point, ahead of remote one. > > > > > > Fix: > > > The sanity check needs to be corrected. Attached a patch to address the > > > issue. > > > > Thanks for reporting, explaining the issue and providing a patch. > > Regarding the patch: > > 1 === > > +* Attempt to sync lsns and xmins only if remote slot is ahead of > local > > s/lsns/LSNs/? > > 2 === > > + if (slot->data.confirmed_flush != > remote_slot->confirmed_lsn) > + elog(LOG, > +"could not synchronize local slot > \"%s\" LSN(%X/%X)" > +" to remote slot's LSN(%X/%X) ", > +remote_slot->name, > + > LSN_FORMAT_ARGS(slot->data.confirmed_flush), > + > LSN_FORMAT_ARGS(remote_slot->confirmed_lsn)); > > I don't think that the message is correct here. Unless I am missing something > there is nothing in the following code path that would prevent the slot to be > sync during this cycle. This is a sanity check, I will put a comment to indicate the same. We want to ensure if anything changes in future, we get correct logs to indicate that. If required, the LOG msg can be changed. Kindly suggest if you have anything better in mind. thanks Shveta
Re: Synchronizing slots from primary to standby
Hi, On Thu, Apr 04, 2024 at 05:31:45PM +0530, shveta malik wrote: > On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote: > > > > > > Prior to commit 2ec005b, this check was okay, as we did not expect > > restart_lsn of the synced slot to be ahead of remote since we were > > directly copying the lsns. But now when we use 'advance' to do logical > > decoding on standby, there is a possibility that restart lsn of the > > synced slot is ahead of remote slot, if there are running txns records > > found after reaching consistent-point while consuming WALs from > > restart_lsn till confirmed_lsn. In such a case, slot-sync's advance > > may end up serializing snapshots and setting restart_lsn to the > > serialized snapshot point, ahead of remote one. > > > > Fix: > > The sanity check needs to be corrected. Attached a patch to address the > > issue. > Thanks for reporting, explaining the issue and providing a patch. Regarding the patch: 1 === +* Attempt to sync lsns and xmins only if remote slot is ahead of local s/lsns/LSNs/? 2 === + if (slot->data.confirmed_flush != remote_slot->confirmed_lsn) + elog(LOG, +"could not synchronize local slot \"%s\" LSN(%X/%X)" +" to remote slot's LSN(%X/%X) ", +remote_slot->name, + LSN_FORMAT_ARGS(slot->data.confirmed_flush), + LSN_FORMAT_ARGS(remote_slot->confirmed_lsn)); I don't think that the message is correct here. Unless I am missing something there is nothing in the following code path that would prevent the slot to be sync during this cycle. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
On Thu, Apr 4, 2024 at 2:59 PM shveta malik wrote: > > > Prior to commit 2ec005b, this check was okay, as we did not expect > restart_lsn of the synced slot to be ahead of remote since we were > directly copying the lsns. But now when we use 'advance' to do logical > decoding on standby, there is a possibility that restart lsn of the > synced slot is ahead of remote slot, if there are running txns records > found after reaching consistent-point while consuming WALs from > restart_lsn till confirmed_lsn. In such a case, slot-sync's advance > may end up serializing snapshots and setting restart_lsn to the > serialized snapshot point, ahead of remote one. > > Fix: > The sanity check needs to be corrected. Attached a patch to address the issue. Please find v2 which has detailed commit-msg and some more comments in code. thanks Shveta v2-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Thu, Apr 4, 2024 at 1:55 PM Masahiko Sawada wrote: > > While testing this change, I realized that it could happen that the > server logs are flooded with the following logical decoding logs that > are written every 200 ms: > > 2024-04-04 16:15:19.270 JST [3838739] LOG: starting logical decoding > for slot "test_sub" ... ... > > For example, I could reproduce it with the following steps: > > 1. create the primary and start. > 2. run "pgbench -i -s 100" on the primary. > 3. run pg_basebackup to create the standby. > 4. configure slotsync setup on the standby and start. > 5. create a publication for all tables on the primary. > 6. create the subscriber and start. > 7. run "pgbench -i -Idtpf" on the subscriber. > 8. create a subscription on the subscriber (initial data copy will start). > > The logical decoding logs were written every 200 ms during the initial > data synchronization. > > Looking at the new changes for update_local_synced_slot(): > > if (remote_slot->confirmed_lsn != slot->data.confirmed_flush || > remote_slot->restart_lsn != slot->data.restart_lsn || > remote_slot->catalog_xmin != slot->data.catalog_xmin) > { > /* > * We can't directly copy the remote slot's LSN or xmin unless there > * exists a consistent snapshot at that point. Otherwise, after > * promotion, the slots may not reach a consistent point before the > * confirmed_flush_lsn which can lead to a data loss. To avoid data > * loss, we let slot machinery advance the slot which ensures that > * snapbuilder/slot statuses are updated properly. > */ > if (SnapBuildSnapshotExists(remote_slot->restart_lsn)) > { > /* > * Update the slot info directly if there is a serialized snapshot > * at the restart_lsn, as the slot can quickly reach consistency > * at restart_lsn by restoring the snapshot. > */ > SpinLockAcquire(&slot->mutex); > slot->data.restart_lsn = remote_slot->restart_lsn; > slot->data.confirmed_flush = remote_slot->confirmed_lsn; > slot->data.catalog_xmin = remote_slot->catalog_xmin; > slot->effective_catalog_xmin = remote_slot->catalog_xmin; > SpinLockRelease(&slot->mutex); > > if (found_consistent_snapshot) > *found_consistent_snapshot = true; > } > else > { > LogicalSlotAdvanceAndCheckSnapState(remote_slot->confirmed_lsn, > found_consistent_snapshot); > } > > ReplicationSlotsComputeRequiredXmin(false); > ReplicationSlotsComputeRequiredLSN(); > > slot_updated = true; > > We call LogicalSlotAdvanceAndCheckSnapState() if one of confirmed_lsn, > restart_lsn, and catalog_xmin is different between the remote slot and > the local slot. In my test case, during the initial sync performing, > only catalog_xmin was different and there was no serialized snapshot > at restart_lsn, and the slotsync worker called > LogicalSlotAdvanceAndCheckSnapState(). However no slot properties were > changed even after the function and it set slot_updated = true. So it > starts the next slot synchronization after 200ms. > > It seems to me that we can skip calling > LogicalSlotAdvanceAndCheckSnapState() at least when the remote and > local have the same restart_lsn and confirmed_lsn. > I think we can do that but do we know what caused catalog_xmin to be updated regularly without any change in restart/confirmed_flush LSN? I think the LSNs are not updated during the initial sync (copy) time but how catalog_xmin is getting updated for the same slot? BTW, if we see, we will probably anyway except this xmin as it is due to the following code in LogicalIncreaseXminForSlot() LogicalIncreaseXminForSlot() { /* * If the client has already confirmed up to this lsn, we directly can * mark this as accepted. This can happen if we restart decoding in a * slot. */ else if (current_lsn <= slot->data.confirmed_flush) { slot->candidate_catalog_xmin = xmin; slot->candidate_xmin_lsn = current_lsn; /* our candidate can directly be used */ updated_xmin = true; } > I'm not sure there are other scenarios but is it worth fixing this symptom? > I think so but let's investigate this a bit more. BTW, while thinking on this one, I noticed that in the function LogicalConfirmReceivedLocation(), we first update the disk copy, see comment [1] and then in-memory whereas the same is not true in update_local_synced_slot() for the case when snapshot exists. Now, do we have the same risk here in case of standby? Because I think we will use these xmins while sending the feedback message (in XLogWalRcvSendHSFeedback()). [1] /* * We have to write the changed xmin to disk *before* we change * the in-memory value, otherwise after a crash we wouldn't know * that some catalog tuples might have been removed
Re: Synchronizing slots from primary to standby
On Wed, Apr 3, 2024 at 3:36 PM Amit Kapila wrote: > > On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila wrote: > > > > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy > > wrote: > > > > > I quickly looked at v8, and have a nit, rest all looks good. > > > > > > +if (DecodingContextReady(ctx) && found_consistent_snapshot) > > > +*found_consistent_snapshot = true; > > > > > > Can the found_consistent_snapshot be checked first to help avoid the > > > function call DecodingContextReady() for pg_replication_slot_advance > > > callers? > > > > > > > Okay, changed. Additionally, I have updated the comments and commit > > message. I'll push this patch after some more testing. > > > > Pushed! There is an intermittent BF failure observed at [1] after this commit (2ec005b). Analysis: We see in BF logs, that during the first call of the sync function, restart_lsn of the synced slot is advanced to a lsn which is > than remote slot's restart_lsn. And when sync call is done second time without any further change on primary, it hits the error: ERROR: cannot synchronize local slot "lsub1_slot" LSN(0/360) to remote slot's LSN(0/328) as synchronization would move it backwards Relevant BF logs are given at [2]. This reproduces intermittently depending upon if bgwriter logs running txn record when the test is running. We were able to mimic the test case to reproduce the failure. Please see attached bf-test.txt for steps. Issue: Issue is that we are having a wrong sanity check based on 'restart_lsn' in synchronize_one_slot(): if (remote_slot->restart_lsn < slot->data.restart_lsn) elog(ERROR, ...); Prior to commit 2ec005b, this check was okay, as we did not expect restart_lsn of the synced slot to be ahead of remote since we were directly copying the lsns. But now when we use 'advance' to do logical decoding on standby, there is a possibility that restart lsn of the synced slot is ahead of remote slot, if there are running txns records found after reaching consistent-point while consuming WALs from restart_lsn till confirmed_lsn. In such a case, slot-sync's advance may end up serializing snapshots and setting restart_lsn to the serialized snapshot point, ahead of remote one. Fix: The sanity check needs to be corrected. Attached a patch to address the issue. a) The sanity check is corrected to compare confirmed_lsn rather than restart_lsn. Additional changes: b) A log has been added after LogicalSlotAdvanceAndCheckSnapState() to log the case when the local and remote slots' confirmed-lsn were not found to be the same after sync (if at all). c) Now we attempt to sync in update_local_synced_slot() if one of confirmed_lsn, restart_lsn, and catalog_xmin for remote slot is ahead of local slot instead of them just being unequal. [1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2024-04-03%2017%3A57%3A28 [2]: 2024-04-03 18:00:41.896 UTC [3239617][client backend][0/2:0] LOG: statement: SELECT pg_sync_replication_slots(); LOG: starting logical decoding for slot "lsub1_slot" DETAIL: Streaming transactions committing after 0/0, reading WAL from 0/328. LOG: logical decoding found consistent point at 0/328 DEBUG: serializing snapshot to pg_logical/snapshots/0-360.snap DEBUG: got new restart lsn 0/360 at 0/360 LOG: newly created slot "lsub1_slot" is sync-ready now 2024-04-03 18:00:45.218 UTC [3243487][client backend][2/2:0] LOG: statement: SELECT pg_sync_replication_slots(); ERROR: cannot synchronize local slot "lsub1_slot" LSN(0/360) to remote slot's LSN(0/328) as synchronization would move it backwards thanks Shveta Keep sync_replication_slots as disabled on standby. --primary: SELECT 'init' FROM pg_create_logical_replication_slot('logicalslot', 'test_decoding', false, false, true); SELECT pg_log_standby_snapshot(); SELECT pg_log_standby_snapshot(); SELECT pg_log_standby_snapshot(); SELECT pg_log_standby_snapshot(); select pg_sleep(2); SELECT pg_replication_slot_advance('logicalslot', pg_current_wal_lsn()); select slot_name,database,xmin,catalog_xmin cxmin,restart_lsn,confirmed_flush_lsn flushLsn,wal_status,conflicting conf, failover,synced,temporary temp, active from pg_replication_slots order by slot_name; --standby: select pg_sync_replication_slots(); select pg_sync_replication_slots(); -- ERROR here v1-0001-Correct-sanity-check-to-compare-confirmed_lsn.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Wed, Apr 3, 2024 at 7:06 PM Amit Kapila wrote: > > On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila wrote: > > > > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy > > wrote: > > > > > I quickly looked at v8, and have a nit, rest all looks good. > > > > > > +if (DecodingContextReady(ctx) && found_consistent_snapshot) > > > +*found_consistent_snapshot = true; > > > > > > Can the found_consistent_snapshot be checked first to help avoid the > > > function call DecodingContextReady() for pg_replication_slot_advance > > > callers? > > > > > > > Okay, changed. Additionally, I have updated the comments and commit > > message. I'll push this patch after some more testing. > > > > Pushed! While testing this change, I realized that it could happen that the server logs are flooded with the following logical decoding logs that are written every 200 ms: 2024-04-04 16:15:19.270 JST [3838739] LOG: starting logical decoding for slot "test_sub" 2024-04-04 16:15:19.270 JST [3838739] DETAIL: Streaming transactions committing after 0/50006F48, reading WAL from 0/50006F10. 2024-04-04 16:15:19.270 JST [3838739] LOG: logical decoding found consistent point at 0/50006F10 2024-04-04 16:15:19.270 JST [3838739] DETAIL: There are no running transactions. 2024-04-04 16:15:19.477 JST [3838739] LOG: starting logical decoding for slot "test_sub" 2024-04-04 16:15:19.477 JST [3838739] DETAIL: Streaming transactions committing after 0/50006F48, reading WAL from 0/50006F10. 2024-04-04 16:15:19.477 JST [3838739] LOG: logical decoding found consistent point at 0/50006F10 2024-04-04 16:15:19.477 JST [3838739] DETAIL: There are no running transactions. For example, I could reproduce it with the following steps: 1. create the primary and start. 2. run "pgbench -i -s 100" on the primary. 3. run pg_basebackup to create the standby. 4. configure slotsync setup on the standby and start. 5. create a publication for all tables on the primary. 6. create the subscriber and start. 7. run "pgbench -i -Idtpf" on the subscriber. 8. create a subscription on the subscriber (initial data copy will start). The logical decoding logs were written every 200 ms during the initial data synchronization. Looking at the new changes for update_local_synced_slot(): if (remote_slot->confirmed_lsn != slot->data.confirmed_flush || remote_slot->restart_lsn != slot->data.restart_lsn || remote_slot->catalog_xmin != slot->data.catalog_xmin) { /* * We can't directly copy the remote slot's LSN or xmin unless there * exists a consistent snapshot at that point. Otherwise, after * promotion, the slots may not reach a consistent point before the * confirmed_flush_lsn which can lead to a data loss. To avoid data * loss, we let slot machinery advance the slot which ensures that * snapbuilder/slot statuses are updated properly. */ if (SnapBuildSnapshotExists(remote_slot->restart_lsn)) { /* * Update the slot info directly if there is a serialized snapshot * at the restart_lsn, as the slot can quickly reach consistency * at restart_lsn by restoring the snapshot. */ SpinLockAcquire(&slot->mutex); slot->data.restart_lsn = remote_slot->restart_lsn; slot->data.confirmed_flush = remote_slot->confirmed_lsn; slot->data.catalog_xmin = remote_slot->catalog_xmin; slot->effective_catalog_xmin = remote_slot->catalog_xmin; SpinLockRelease(&slot->mutex); if (found_consistent_snapshot) *found_consistent_snapshot = true; } else { LogicalSlotAdvanceAndCheckSnapState(remote_slot->confirmed_lsn, found_consistent_snapshot); } ReplicationSlotsComputeRequiredXmin(false); ReplicationSlotsComputeRequiredLSN(); slot_updated = true; We call LogicalSlotAdvanceAndCheckSnapState() if one of confirmed_lsn, restart_lsn, and catalog_xmin is different between the remote slot and the local slot. In my test case, during the initial sync performing, only catalog_xmin was different and there was no serialized snapshot at restart_lsn, and the slotsync worker called LogicalSlotAdvanceAndCheckSnapState(). However no slot properties were changed even after the function and it set slot_updated = true. So it starts the next slot synchronization after 200ms. It seems to me that we can skip calling LogicalSlotAdvanceAndCheckSnapState() at least when the remote and local have the same restart_lsn and confirmed_lsn. I'm not sure there are other scenarios but is it worth fixing this symptom? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
On Wed, Apr 3, 2024 at 11:13 AM Amit Kapila wrote: > > On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy > wrote: > > > I quickly looked at v8, and have a nit, rest all looks good. > > > > +if (DecodingContextReady(ctx) && found_consistent_snapshot) > > +*found_consistent_snapshot = true; > > > > Can the found_consistent_snapshot be checked first to help avoid the > > function call DecodingContextReady() for pg_replication_slot_advance > > callers? > > > > Okay, changed. Additionally, I have updated the comments and commit > message. I'll push this patch after some more testing. > Pushed! -- With Regards, Amit Kapila.
Re: Synchronizing slots from primary to standby
On Wed, Apr 3, 2024 at 9:36 AM Bharath Rupireddy wrote: > > On Wed, Apr 3, 2024 at 9:04 AM Amit Kapila wrote: > > > > > I'd just rename LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr > > > moveto, bool *found_consistent_snapshot) to > > > pg_logical_replication_slot_advance(XLogRecPtr moveto, bool > > > *found_consistent_snapshot) and use it. If others don't like this, I'd > > > at least turn pg_logical_replication_slot_advance(XLogRecPtr moveto) a > > > static inline function. > > > > > Yeah, we can do that but it is not a performance-sensitive routine so > > don't know if it is worth it. > > Okay for what the patch has right now. No more bikeshedding from me on this. > > > > > The slotsync worker needs to advance the slots from different databases > > > > in > > > > fast_forward. So, we need to skip this check in fast_forward mode. The > > > > analysis can > > > > be found in [3]. > > > -if (slot->data.database != MyDatabaseId) > > > +/* > > > + * We need to access the system tables during decoding to build the > > > + * logical changes unless we are in fast_forward mode where no > > > changes are > > > + * generated. > > > + */ > > > +if (slot->data.database != MyDatabaseId && !fast_forward) > > > ereport(ERROR, > > > > > > It's not clear from the comment that we need it for a slotsync worker > > > to advance the slots from different databases. Can this be put into > > > the comment? Also, specify in the comment, why this is safe? > > > > > It is not specific to slot sync worker but specific to fast_forward > > mode. There is already a comment "We need to access the system tables > > during decoding to build the logical changes unless we are in > > fast_forward mode where no changes are generated." telling why it is > > safe. The point is we need database access to access system tables > > while generating the logical changes and in fast-forward mode, we > > don't generate logical changes so this check is not required. Do let > > me if you have a different understanding or if my understanding is > > incorrect. > > Understood. Thanks. Just curious, why isn't a problem for the existing > fast_forward mode callers pg_replication_slot_advance and > LogicalReplicationSlotHasPendingWal? > We call those after connecting to the database and the slot also belongs to that database whereas during synchronization of slots standby. the slots could be from different databases. > I quickly looked at v8, and have a nit, rest all looks good. > > +if (DecodingContextReady(ctx) && found_consistent_snapshot) > +*found_consistent_snapshot = true; > > Can the found_consistent_snapshot be checked first to help avoid the > function call DecodingContextReady() for pg_replication_slot_advance > callers? > Okay, changed. Additionally, I have updated the comments and commit message. I'll push this patch after some more testing. -- With Regards, Amit Kapila. v9-0001-Ensure-that-the-sync-slots-reach-a-consistent-sta.patch Description: Binary data
Re: Synchronizing slots from primary to standby
On Wed, Apr 3, 2024 at 9:04 AM Amit Kapila wrote: > > > I'd just rename LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr > > moveto, bool *found_consistent_snapshot) to > > pg_logical_replication_slot_advance(XLogRecPtr moveto, bool > > *found_consistent_snapshot) and use it. If others don't like this, I'd > > at least turn pg_logical_replication_slot_advance(XLogRecPtr moveto) a > > static inline function. > > > Yeah, we can do that but it is not a performance-sensitive routine so > don't know if it is worth it. Okay for what the patch has right now. No more bikeshedding from me on this. > > > The slotsync worker needs to advance the slots from different databases in > > > fast_forward. So, we need to skip this check in fast_forward mode. The > > > analysis can > > > be found in [3]. > > -if (slot->data.database != MyDatabaseId) > > +/* > > + * We need to access the system tables during decoding to build the > > + * logical changes unless we are in fast_forward mode where no changes > > are > > + * generated. > > + */ > > +if (slot->data.database != MyDatabaseId && !fast_forward) > > ereport(ERROR, > > > > It's not clear from the comment that we need it for a slotsync worker > > to advance the slots from different databases. Can this be put into > > the comment? Also, specify in the comment, why this is safe? > > > It is not specific to slot sync worker but specific to fast_forward > mode. There is already a comment "We need to access the system tables > during decoding to build the logical changes unless we are in > fast_forward mode where no changes are generated." telling why it is > safe. The point is we need database access to access system tables > while generating the logical changes and in fast-forward mode, we > don't generate logical changes so this check is not required. Do let > me if you have a different understanding or if my understanding is > incorrect. Understood. Thanks. Just curious, why isn't a problem for the existing fast_forward mode callers pg_replication_slot_advance and LogicalReplicationSlotHasPendingWal? I quickly looked at v8, and have a nit, rest all looks good. +if (DecodingContextReady(ctx) && found_consistent_snapshot) +*found_consistent_snapshot = true; Can the found_consistent_snapshot be checked first to help avoid the function call DecodingContextReady() for pg_replication_slot_advance callers? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
On Tue, Apr 2, 2024 at 7:42 PM Bharath Rupireddy wrote: > > On Tue, Apr 2, 2024 at 7:25 PM Zhijie Hou (Fujitsu) > wrote: > > > > > 1. Can we just remove pg_logical_replication_slot_advance and use > > > LogicalSlotAdvanceAndCheckSnapState instead? If worried about the > > > function naming, LogicalSlotAdvanceAndCheckSnapState can be renamed to > > > pg_logical_replication_slot_advance? > > > > > > + * Advance our logical replication slot forward. See > > > + * LogicalSlotAdvanceAndCheckSnapState for details. > > > */ > > > static XLogRecPtr > > > pg_logical_replication_slot_advance(XLogRecPtr moveto) > > > { > > > > It was commented[1] that it's not appropriate for the > > pg_logical_replication_slot_advance to have an out parameter > > 'ready_for_decoding' which looks bit awkward as the functionality doesn't > > match > > the name, and is also not consistent with the style of > > pg_physical_replication_slot_advance(). So, we added a new function. > > I disagree here. A new function just for a parameter is not that great > IMHO. > It is not for the parameter but primarily for the functionality it provides. The additional functionality of whether we reached a consistent point while advancing the slot doesn't sound to suit the current function. Also, we want to keep the signature similar to the existing function pg_physical_replication_slot_advance(). > > I'd just rename LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr > moveto, bool *found_consistent_snapshot) to > pg_logical_replication_slot_advance(XLogRecPtr moveto, bool > *found_consistent_snapshot) and use it. If others don't like this, I'd > at least turn pg_logical_replication_slot_advance(XLogRecPtr moveto) a > static inline function. > Yeah, we can do that but it is not a performance-sensitive routine so don't know if it is worth it. > > > 5. As far as the test case for this issue is concerned, I'm fine with > > > adding one using an INJECTION point because we seem to be having no > > > consistent way to control postgres writing current snapshot to WAL. > > > > Since me and my colleagues can reproduce the issue consistently after > > applying > > 0002 and it could be rare for concurrent xl_running_xacts to happen, we > > discussed[2] to > > consider adding the INJECTION point after pushing the main fix. > > Right. > > > > 7. > > > +/* > > > + * We need to access the system tables during decoding to build the > > > + * logical changes unless we are in fast-forward mode where no > > > changes > > > are > > > + * generated. > > > + */ > > > +if (slot->data.database != MyDatabaseId && !fast_forward) > > > > > > May I know if we need this change for this fix? > > > > The slotsync worker needs to advance the slots from different databases in > > fast_forward. So, we need to skip this check in fast_forward mode. The > > analysis can > > be found in [3]. > -if (slot->data.database != MyDatabaseId) > +/* > + * We need to access the system tables during decoding to build the > + * logical changes unless we are in fast_forward mode where no changes > are > + * generated. > + */ > +if (slot->data.database != MyDatabaseId && !fast_forward) > ereport(ERROR, > > It's not clear from the comment that we need it for a slotsync worker > to advance the slots from different databases. Can this be put into > the comment? Also, specify in the comment, why this is safe? > It is not specific to slot sync worker but specific to fast_forward mode. There is already a comment "We need to access the system tables during decoding to build the logical changes unless we are in fast_forward mode where no changes are generated." telling why it is safe. The point is we need database access to access system tables while generating the logical changes and in fast-forward mode, we don't generate logical changes so this check is not required. Do let me if you have a different understanding or if my understanding is incorrect. -- With Regards, Amit Kapila.
Re: Synchronizing slots from primary to standby
On Tue, Apr 2, 2024 at 7:25 PM Zhijie Hou (Fujitsu) wrote: > > > 1. Can we just remove pg_logical_replication_slot_advance and use > > LogicalSlotAdvanceAndCheckSnapState instead? If worried about the > > function naming, LogicalSlotAdvanceAndCheckSnapState can be renamed to > > pg_logical_replication_slot_advance? > > > > + * Advance our logical replication slot forward. See > > + * LogicalSlotAdvanceAndCheckSnapState for details. > > */ > > static XLogRecPtr > > pg_logical_replication_slot_advance(XLogRecPtr moveto) > > { > > It was commented[1] that it's not appropriate for the > pg_logical_replication_slot_advance to have an out parameter > 'ready_for_decoding' which looks bit awkward as the functionality doesn't > match > the name, and is also not consistent with the style of > pg_physical_replication_slot_advance(). So, we added a new function. I disagree here. A new function just for a parameter is not that great IMHO. I'd just rename LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr moveto, bool *found_consistent_snapshot) to pg_logical_replication_slot_advance(XLogRecPtr moveto, bool *found_consistent_snapshot) and use it. If others don't like this, I'd at least turn pg_logical_replication_slot_advance(XLogRecPtr moveto) a static inline function. > > 5. As far as the test case for this issue is concerned, I'm fine with > > adding one using an INJECTION point because we seem to be having no > > consistent way to control postgres writing current snapshot to WAL. > > Since me and my colleagues can reproduce the issue consistently after applying > 0002 and it could be rare for concurrent xl_running_xacts to happen, we > discussed[2] to > consider adding the INJECTION point after pushing the main fix. Right. > > 7. > > +/* > > + * We need to access the system tables during decoding to build the > > + * logical changes unless we are in fast-forward mode where no changes > > are > > + * generated. > > + */ > > +if (slot->data.database != MyDatabaseId && !fast_forward) > > > > May I know if we need this change for this fix? > > The slotsync worker needs to advance the slots from different databases in > fast_forward. So, we need to skip this check in fast_forward mode. The > analysis can > be found in [3]. -if (slot->data.database != MyDatabaseId) +/* + * We need to access the system tables during decoding to build the + * logical changes unless we are in fast_forward mode where no changes are + * generated. + */ +if (slot->data.database != MyDatabaseId && !fast_forward) ereport(ERROR, It's not clear from the comment that we need it for a slotsync worker to advance the slots from different databases. Can this be put into the comment? Also, specify in the comment, why this is safe? Also, if this change is needed for only slotsync workers, why not protect it with IsSyncingReplicationSlots()? Otherwise, it might impact non-slotsync callers, no? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
RE: Synchronizing slots from primary to standby
On Tuesday, April 2, 2024 8:49 PM Bharath Rupireddy wrote: > > On Tue, Apr 2, 2024 at 2:11 PM Zhijie Hou (Fujitsu) > wrote: > > > > CFbot[1] complained about one query result's order in the tap-test, so I am > > attaching a V7 patch set which fixed this. There are no changes in 0001. > > > > [1] https://cirrus-ci.com/task/6375962162495488 > > Thanks. Here are some comments: Thanks for the comments. > > 1. Can we just remove pg_logical_replication_slot_advance and use > LogicalSlotAdvanceAndCheckSnapState instead? If worried about the > function naming, LogicalSlotAdvanceAndCheckSnapState can be renamed to > pg_logical_replication_slot_advance? > > + * Advance our logical replication slot forward. See > + * LogicalSlotAdvanceAndCheckSnapState for details. > */ > static XLogRecPtr > pg_logical_replication_slot_advance(XLogRecPtr moveto) > { It was commented[1] that it's not appropriate for the pg_logical_replication_slot_advance to have an out parameter 'ready_for_decoding' which looks bit awkward as the functionality doesn't match the name, and is also not consistent with the style of pg_physical_replication_slot_advance(). So, we added a new function. > > 2. > +if (!ready_for_decoding) > +{ > +elog(DEBUG1, "could not find consistent point for synced > slot; restart_lsn = %X/%X", > + LSN_FORMAT_ARGS(slot->data.restart_lsn)); > > Can we specify the slot name in the message? Added. > > 3. Also, can the "could not find consistent point for synced slot; > restart_lsn = %X/%X" be emitted at LOG level just like other messages > in update_and_persist_local_synced_slot. Although, I see "XXX should > this be changed to elog(DEBUG1) perhaps?", these messages need to be > at LOG level as they help debug issues if at all they are hit. Changed to LOG and reworded the message. > > 4. How about using found_consistent_snapshot instead of > ready_for_decoding? A general understanding is that the synced slots > are not allowed for decoding (although with this fix, we do that for > internal purposes), ready_for_decoding looks a bit misleading. Agreed and renamed. > > 5. As far as the test case for this issue is concerned, I'm fine with > adding one using an INJECTION point because we seem to be having no > consistent way to control postgres writing current snapshot to WAL. Since me and my colleagues can reproduce the issue consistently after applying 0002 and it could be rare for concurrent xl_running_xacts to happen, we discussed[2] to consider adding the INJECTION point after pushing the main fix. > > 6. A nit: can we use "fast_forward mode" instead of "fast-forward > mode" just to be consistent? > + * logical changes unless we are in fast-forward mode where no changes > are > > 7. > +/* > + * We need to access the system tables during decoding to build the > + * logical changes unless we are in fast-forward mode where no changes > are > + * generated. > + */ > +if (slot->data.database != MyDatabaseId && !fast_forward) > > May I know if we need this change for this fix? The slotsync worker needs to advance the slots from different databases in fast_forward. So, we need to skip this check in fast_forward mode. The analysis can be found in [3]. Attach the V8 patch which addressed above comments. [1] https://www.postgresql.org/message-id/CAA4eK1%2BwkaRi2BrLLC_0gKbHN68Awc9dRp811G3An6A6fuqdOg%40mail.gmail.com [2] https://www.postgresql.org/message-id/ZgvI9iAUWCZ17z5V%40ip-10-97-1-34.eu-west-3.compute.internal [3] https://www.postgresql.org/message-id/CAJpy0uCQ2PDCAqcnbdOz6q_ZqmBfMyBpVqKDqL_XZBP%3DeK-1yw%40mail.gmail.com Best Regards, Hou zj v8-0002-test-the-data-loss-case.patch Description: v8-0002-test-the-data-loss-case.patch v8-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch Description: v8-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Re: Synchronizing slots from primary to standby
On Tue, Apr 2, 2024 at 2:11 PM Zhijie Hou (Fujitsu) wrote: > > CFbot[1] complained about one query result's order in the tap-test, so I am > attaching a V7 patch set which fixed this. There are no changes in 0001. > > [1] https://cirrus-ci.com/task/6375962162495488 Thanks. Here are some comments: 1. Can we just remove pg_logical_replication_slot_advance and use LogicalSlotAdvanceAndCheckSnapState instead? If worried about the function naming, LogicalSlotAdvanceAndCheckSnapState can be renamed to pg_logical_replication_slot_advance? + * Advance our logical replication slot forward. See + * LogicalSlotAdvanceAndCheckSnapState for details. */ static XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr moveto) { 2. +if (!ready_for_decoding) +{ +elog(DEBUG1, "could not find consistent point for synced slot; restart_lsn = %X/%X", + LSN_FORMAT_ARGS(slot->data.restart_lsn)); Can we specify the slot name in the message? 3. Also, can the "could not find consistent point for synced slot; restart_lsn = %X/%X" be emitted at LOG level just like other messages in update_and_persist_local_synced_slot. Although, I see "XXX should this be changed to elog(DEBUG1) perhaps?", these messages need to be at LOG level as they help debug issues if at all they are hit. 4. How about using found_consistent_snapshot instead of ready_for_decoding? A general understanding is that the synced slots are not allowed for decoding (although with this fix, we do that for internal purposes), ready_for_decoding looks a bit misleading. 5. As far as the test case for this issue is concerned, I'm fine with adding one using an INJECTION point because we seem to be having no consistent way to control postgres writing current snapshot to WAL. 6. A nit: can we use "fast_forward mode" instead of "fast-forward mode" just to be consistent? + * logical changes unless we are in fast-forward mode where no changes are 7. +/* + * We need to access the system tables during decoding to build the + * logical changes unless we are in fast-forward mode where no changes are + * generated. + */ +if (slot->data.database != MyDatabaseId && !fast_forward) May I know if we need this change for this fix? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
Hi, On Tue, Apr 02, 2024 at 02:19:30PM +0530, Amit Kapila wrote: > On Tue, Apr 2, 2024 at 1:54 PM Bertrand Drouvot > wrote: > > What about adding a "wait" injection point in LogStandbySnapshot() to > > prevent > > checkpointer/bgwriter to log a standby snapshot? Something among those > > lines: > > > >if (AmCheckpointerProcess() || AmBackgroundWriterProcess()) > >INJECTION_POINT("bgw-log-standby-snapshot"); > > > > And make use of it in the test, something like: > > > >$node_primary->safe_psql('postgres', > >"SELECT injection_points_attach('bgw-log-standby-snapshot', > > 'wait');"); > > > > Sometimes we want the checkpoint to log the standby snapshot as we > need it at a predictable time, maybe one can use > pg_log_standby_snapshot() instead of that. Can we add an injection > point as a separate patch/commit after a bit more discussion? Sure, let's come back to this injection point discussion after the feature freeze. BTW, I think it could also be useful to make use of injection point for the test that has been added in 7f13ac8123. I'll open a new thread for this at that time. >. One other > idea to make such tests predictable is to add a developer-specific GUC > say debug_bg_log_standby_snapshot or something like that but injection > point sounds like a better idea. Agree. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
On Tue, Apr 2, 2024 at 1:54 PM Bertrand Drouvot wrote: > > On Tue, Apr 02, 2024 at 07:20:46AM +, Zhijie Hou (Fujitsu) wrote: > > I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which > > can > > reproduce the data loss issue consistently on my machine. > > Thanks! > > > It may not reproduce > > in some rare cases if concurrent xl_running_xacts are written by bgwriter, > > but > > I think it's still valuable if it can verify the fix in most cases. > > What about adding a "wait" injection point in LogStandbySnapshot() to prevent > checkpointer/bgwriter to log a standby snapshot? Something among those lines: > >if (AmCheckpointerProcess() || AmBackgroundWriterProcess()) >INJECTION_POINT("bgw-log-standby-snapshot"); > > And make use of it in the test, something like: > >$node_primary->safe_psql('postgres', >"SELECT injection_points_attach('bgw-log-standby-snapshot', > 'wait');"); > Sometimes we want the checkpoint to log the standby snapshot as we need it at a predictable time, maybe one can use pg_log_standby_snapshot() instead of that. Can we add an injection point as a separate patch/commit after a bit more discussion? I want to discuss this in a separate thread so that later we should not get an objection to adding an injection_point at this location. One other idea to make such tests predictable is to add a developer-specific GUC say debug_bg_log_standby_snapshot or something like that but injection point sounds like a better idea. -- With Regards, Amit Kapila.
RE: Synchronizing slots from primary to standby
On Tuesday, April 2, 2024 3:21 PM Zhijie Hou (Fujitsu) wrote: > On Tuesday, April 2, 2024 8:35 AM Zhijie Hou (Fujitsu) > wrote: > > > > On Monday, April 1, 2024 7:30 PM Amit Kapila > > wrote: > > > > > > On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu) > > > > > > wrote: > > > > > > > > On Friday, March 29, 2024 2:50 PM Amit Kapila > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > 2. > > > > > +extern XLogRecPtr > > > > > +pg_logical_replication_slot_advance(XLogRecPtr > > > moveto, > > > > > + bool *found_consistent_point); > > > > > + > > > > > > > > > > This API looks a bit awkward as the functionality doesn't match > > > > > the name. How about having a function with name > > > > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto, > > > > > ready_for_decoding) with the same functionality as your patch > > > > > has for > > > > > pg_logical_replication_slot_advance() and then invoke it both > > > > > from pg_logical_replication_slot_advance and slotsync.c. The > > > > > function name is too big, we can think of a shorter name. Any ideas? > > > > > > > > How about LogicalSlotAdvanceAndCheckDecodingState() Or just > > > > LogicalSlotAdvanceAndCheckDecoding()? > > > > > > > > > > It is about snapbuild state, so how about naming the function as > > > LogicalSlotAdvanceAndCheckSnapState()? > > > > It looks better to me, so changed. > > > > > > > > I have made quite a few cosmetic changes in comments and code. See > > > attached. This is atop your latest patch. Can you please review and > > > include these changes in the next version? > > > > Thanks, I have reviewed and merged them. > > Attach the V5 patch set which addressed above comments and ran pgindent. > > I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which > can > reproduce the data loss issue consistently on my machine. It may not > reproduce in some rare cases if concurrent xl_running_xacts are written by > bgwriter, but I think it's still valuable if it can verify the fix in most > cases. The test > will fail if directly applied on HEAD, and will pass after applying atop of > 0001. CFbot[1] complained about one query result's order in the tap-test, so I am attaching a V7 patch set which fixed this. There are no changes in 0001. [1] https://cirrus-ci.com/task/6375962162495488 Best Regards, Hou zj v7-0002-test-the-data-loss-case.patch Description: v7-0002-test-the-data-loss-case.patch v7-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch Description: v7-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Re: Synchronizing slots from primary to standby
Hi, On Tue, Apr 02, 2024 at 07:20:46AM +, Zhijie Hou (Fujitsu) wrote: > I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which > can > reproduce the data loss issue consistently on my machine. Thanks! > It may not reproduce > in some rare cases if concurrent xl_running_xacts are written by bgwriter, but > I think it's still valuable if it can verify the fix in most cases. What about adding a "wait" injection point in LogStandbySnapshot() to prevent checkpointer/bgwriter to log a standby snapshot? Something among those lines: if (AmCheckpointerProcess() || AmBackgroundWriterProcess()) INJECTION_POINT("bgw-log-standby-snapshot"); And make use of it in the test, something like: $node_primary->safe_psql('postgres', "SELECT injection_points_attach('bgw-log-standby-snapshot', 'wait');"); Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
RE: Synchronizing slots from primary to standby
On Tuesday, April 2, 2024 8:35 AM Zhijie Hou (Fujitsu) wrote: > > On Monday, April 1, 2024 7:30 PM Amit Kapila > wrote: > > > > On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu) > > > > wrote: > > > > > > On Friday, March 29, 2024 2:50 PM Amit Kapila > > > > > wrote: > > > > > > > > > > > > > > > > > > > 2. > > > > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr > > moveto, > > > > + bool *found_consistent_point); > > > > + > > > > > > > > This API looks a bit awkward as the functionality doesn't match > > > > the name. How about having a function with name > > > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto, > > > > ready_for_decoding) with the same functionality as your patch has > > > > for > > > > pg_logical_replication_slot_advance() and then invoke it both from > > > > pg_logical_replication_slot_advance and slotsync.c. The function > > > > name is too big, we can think of a shorter name. Any ideas? > > > > > > How about LogicalSlotAdvanceAndCheckDecodingState() Or just > > > LogicalSlotAdvanceAndCheckDecoding()? > > > > > > > It is about snapbuild state, so how about naming the function as > > LogicalSlotAdvanceAndCheckSnapState()? > > It looks better to me, so changed. > > > > > I have made quite a few cosmetic changes in comments and code. See > > attached. This is atop your latest patch. Can you please review and > > include these changes in the next version? > > Thanks, I have reviewed and merged them. > Attach the V5 patch set which addressed above comments and ran pgindent. I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which can reproduce the data loss issue consistently on my machine. It may not reproduce in some rare cases if concurrent xl_running_xacts are written by bgwriter, but I think it's still valuable if it can verify the fix in most cases. The test will fail if directly applied on HEAD, and will pass after applying atop of 0001. Best Regards, Hou zj v6-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch Description: v6-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch v6-0002-test-the-data-loss-case.patch Description: v6-0002-test-the-data-loss-case.patch
Re: Synchronizing slots from primary to standby
Hi, On Tue, Apr 02, 2024 at 04:24:49AM +, Zhijie Hou (Fujitsu) wrote: > On Monday, April 1, 2024 9:28 PM Bertrand Drouvot > wrote: > > > > On Mon, Apr 01, 2024 at 05:04:53PM +0530, Amit Kapila wrote: > > > On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot > > > > > > > > > 2 === > > > > > > > > + { > > > > + if (SnapBuildSnapshotExists(remote_slot->restart_lsn)) > > > > + { > > > > > > > > That could call SnapBuildSnapshotExists() multiple times for the > > > > same "restart_lsn" (for example in case of multiple remote slots to > > > > sync). > > > > > > > > What if the sync worker records the last lsn it asks for > > > > serialization (and serialized ? Then we could check that value first > > > > before deciding to call (or not) > > > > SnapBuildSnapshotExists() on it? > > > > > > > > It's not ideal because it would record "only the last one" but that > > > > would be simple enough for now (currently there is only one sync > > > > worker so that scenario is likely to happen). > > > > > > > > > > Yeah, we could do that but I am not sure how much it can help. I guess > > > we could do some tests to see if it helps. > > > > Yeah not sure either. I just think it can only help and shouldn't make > > things > > worst (but could avoid extra SnapBuildSnapshotExists() calls). > > Thanks for the idea. I tried some tests based on Nisha's setup[1]. Thank you and Nisha and Shveta for the testing! > I tried to > advance the slots on the primary to the same restart_lsn before calling > sync_replication_slots(), and reduced the data generated by pgbench. Agree that this scenario makes sense to try to see the impact of SnapBuildSnapshotExists(). > The SnapBuildSnapshotExists is still not noticeable in the profile. SnapBuildSnapshotExists() number of calls are probably negligeable when compared to the IO calls generated by the fast forward logical decoding in this scenario. > So, I feel we > could leave this as a further improvement once we encounter scenarios where > the duplicate SnapBuildSnapshotExists call becomes noticeable. Sounds reasonable to me. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
On Mon, Apr 1, 2024 at 5:05 PM Amit Kapila wrote: > > > 2 === > > > > + { > > + if (SnapBuildSnapshotExists(remote_slot->restart_lsn)) > > + { > > > > That could call SnapBuildSnapshotExists() multiple times for the same > > "restart_lsn" (for example in case of multiple remote slots to sync). > > > > What if the sync worker records the last lsn it asks for serialization (and > > serialized ? Then we could check that value first before deciding to call > > (or not) > > SnapBuildSnapshotExists() on it? > > > > It's not ideal because it would record "only the last one" but that would be > > simple enough for now (currently there is only one sync worker so that > > scenario > > is likely to happen). > > > > Yeah, we could do that but I am not sure how much it can help. I guess > we could do some tests to see if it helps. I had a look at test-results conducted by Nisha, I did not find any repetitive restart_lsn from primary being synced to standby for that particular test of 100 slots. Unless we have some concrete test in mind (having repetitive restart_lsn), I do not think that by using the given tests, we can establish the benefit of suggested optimization. Attached the log files of all slots test for reference, thanks Shveta slot_name | database | xmin | cxmin | restart_lsn | flushlsn | wal_status | conf | failover | synced | temp ---+--+-+---+-+++--+--++-- slot_1| newdb1 | | 772 | 0/AD0 | 0/E43E9240 | reserved | f| t| f | f slot_10 | newdb2 | | 772 | 0/A0002C8 | 0/E43E9240 | reserved | f| t| f | f slot_100 | newdb20 | | 772 | 0/A005F90 | 0/E43E9240 | reserved | f| t| f | f slot_11 | newdb3 | | 772 | 0/A000300 | 0/E43E9240 | reserved | f| t| f | f slot_12 | newdb3 | | 772 | 0/A000338 | 0/E43E9240 | reserved | f| t| f | f slot_13 | newdb3 | | 772 | 0/A000370 | 0/E43E9240 | reserved | f| t| f | f slot_14 | newdb3 | | 772 | 0/A0003A8 | 0/E43E9240 | reserved | f| t| f | f slot_15 | newdb3 | | 772 | 0/A0003E0 | 0/E43E9240 | reserved | f| t| f | f slot_16 | newdb4 | | 772 | 0/A000418 | 0/E43E9240 | reserved | f| t| f | f slot_17 | newdb4 | | 772 | 0/A000450 | 0/E43E9240 | reserved | f| t| f | f slot_18 | newdb4 | | 772 | 0/A000488 | 0/E43E9240 | reserved | f| t| f | f slot_19 | newdb4 | | 772 | 0/A0004C0 | 0/E43E9240 | reserved | f| t| f | f slot_2| newdb1 | | 772 | 0/A000108 | 0/E43E9240 | reserved | f| t| f | f slot_20 | newdb4 | | 772 | 0/A0004F8 | 0/E43E9240 | reserved | f| t| f | f slot_21 | newdb5 | | 772 | 0/A000530 | 0/E43E9240 | reserved | f| t| f | f slot_22 | newdb5 | | 772 | 0/A000568 | 0/E43E9240 | reserved | f| t| f | f slot_23 | newdb5 | | 772 | 0/A0005A0 | 0/E43E9240 | reserved | f| t| f | f slot_24 | newdb5 | | 772 | 0/A0005D8 | 0/E43E9240 | reserved | f| t| f | f slot_25 | newdb5 | | 772 | 0/A000610 | 0/E43E9240 | reserved | f| t| f | f slot_26 | newdb6 | | 772 | 0/A000648 | 0/E43E9240 | reserved | f| t| f | f slot_27 | newdb6 | | 772 | 0/A000680 | 0/E43E9240 | reserved | f| t| f | f slot_28 | newdb6 | | 772 | 0/A0006B8 | 0/E43E9240 | reserved | f| t| f | f slot_29 | newdb6 | | 772 | 0/A0006F0 | 0/E43E9240 | reserved | f| t| f | f slot_3| newdb1 | | 772 | 0/A000140 | 0/E43E9240 | reserved | f| t| f | f slot_30 | newdb6 | | 772 | 0/A000728 | 0/E43E9240 | reserved | f| t| f | f slot_31 | newdb7 | | 772 | 0/A000760 | 0/E43E9240 | reserved | f| t| f | f slot_32 | newdb7 | | 772 | 0/A000798 | 0/E43E9240 | reserved | f| t| f | f slot_33 | newdb7 | | 772 | 0/A0007D0 | 0/E43E9240 | reserved | f| t| f | f slot_34 | newdb7 | | 772 | 0/A000808 | 0/E43E9240 | reserved | f| t| f | f slot_35 | newdb7 | | 772 | 0/A000840 | 0/E43E9240 | reserved | f| t| f | f slot_36 | newdb8 | | 772 | 0/A000878 | 0/E43E9240 | reserved | f| t| f | f slot_37 |
RE: Synchronizing slots from primary to standby
On Tuesday, April 2, 2024 8:43 AM Bharath Rupireddy wrote: > > On Mon, Apr 1, 2024 at 11:36 AM Zhijie Hou (Fujitsu) > wrote: > > > > Attach the V4 patch which includes the optimization to skip the > > decoding if the snapshot at the syncing restart_lsn is already > > serialized. It can avoid most of the duplicate decoding in my test, and I am > doing some more tests locally. > > Thanks for the patch. I'm thinking if we can reduce the amount of work that we > do for synced slots in each sync worker cycle. With that context in mind, why > do > we need to create decoding context every time? > Can't we create it once, store it in an in-memory structure and use it for > each > sync worker cycle? Is there any problem with it? What do you think? Thanks for the idea. I think the cost of decoding context seems to be relatively minor when compared to the IO cost. After generating the profiles for the tests shared by Nisha[1], it appears that the StartupDecodingContext is not a issue. While the suggested refactoring is an option, I think we can consider this as a future improvement and addressing it only if we encounter scenarios where the creation of decoding context becomes a bottleneck. [1] https://www.postgresql.org/message-id/CALj2ACUeij5tFzJ1-cuoUh%2Bmhj33v%2BYgqD_gHYUpRdXSCSBbhw%40mail.gmail.com Best Regards, Hou zj <>
Re: Synchronizing slots from primary to standby
On Mon, Apr 1, 2024 at 6:58 PM Bertrand Drouvot wrote: > > On Mon, Apr 01, 2024 at 05:04:53PM +0530, Amit Kapila wrote: > > On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot > > wrote: > > > Then there is no need to call WaitForStandbyConfirmation() as it could go > > > until > > > the RecoveryInProgress() in StandbySlotsHaveCaughtup() for nothing (as we > > > already > > > know it). > > > > > > > Won't it will normally return from the first check in > > WaitForStandbyConfirmation() because standby_slot_names_config is not > > set on standby? > > I think standby_slot_names can be set on a standby. One could want to set it > in > a cascading standby env (though it won't have any real effects until the > standby > is promoted). > Yeah, it is possible but doesn't seem worth additional checks for this micro-optimization. -- With Regards, Amit Kapila.
Re: Synchronizing slots from primary to standby
On Mon, Apr 1, 2024 at 11:36 AM Zhijie Hou (Fujitsu) wrote: > > Attach the V4 patch which includes the optimization to skip the decoding if > the snapshot at the syncing restart_lsn is already serialized. It can avoid > most > of the duplicate decoding in my test, and I am doing some more tests locally. Thanks for the patch. I'm thinking if we can reduce the amount of work that we do for synced slots in each sync worker cycle. With that context in mind, why do we need to create decoding context every time? Can't we create it once, store it in an in-memory structure and use it for each sync worker cycle? Is there any problem with it? What do you think? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
RE: Synchronizing slots from primary to standby
On Monday, April 1, 2024 7:30 PM Amit Kapila wrote: > > On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu) > wrote: > > > > On Friday, March 29, 2024 2:50 PM Amit Kapila > wrote: > > > > > > > > > > > > > > 2. > > > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr > moveto, > > > + bool *found_consistent_point); > > > + > > > > > > This API looks a bit awkward as the functionality doesn't match the > > > name. How about having a function with name > > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto, > > > ready_for_decoding) with the same functionality as your patch has > > > for > > > pg_logical_replication_slot_advance() and then invoke it both from > > > pg_logical_replication_slot_advance and slotsync.c. The function > > > name is too big, we can think of a shorter name. Any ideas? > > > > How about LogicalSlotAdvanceAndCheckDecodingState() Or just > > LogicalSlotAdvanceAndCheckDecoding()? > > > > It is about snapbuild state, so how about naming the function as > LogicalSlotAdvanceAndCheckSnapState()? It looks better to me, so changed. > > I have made quite a few cosmetic changes in comments and code. See > attached. This is atop your latest patch. Can you please review and include > these changes in the next version? Thanks, I have reviewed and merged them. Attach the V5 patch set which addressed above comments and ran pgindent. I will think and test the improvement suggested by Bertrand[1] and reply after that. [1] https://www.postgresql.org/message-id/Zgp8n9QD5nYSESnM%40ip-10-97-1-34.eu-west-3.compute.internal Best Regards, Hou zj v5-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch Description: v5-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Re: Synchronizing slots from primary to standby
Hi, On Mon, Apr 01, 2024 at 05:04:53PM +0530, Amit Kapila wrote: > On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot > wrote: > > Then there is no need to call WaitForStandbyConfirmation() as it could go > > until > > the RecoveryInProgress() in StandbySlotsHaveCaughtup() for nothing (as we > > already > > know it). > > > > Won't it will normally return from the first check in > WaitForStandbyConfirmation() because standby_slot_names_config is not > set on standby? I think standby_slot_names can be set on a standby. One could want to set it in a cascading standby env (though it won't have any real effects until the standby is promoted). > > > 2 === > > > > + { > > + if (SnapBuildSnapshotExists(remote_slot->restart_lsn)) > > + { > > > > That could call SnapBuildSnapshotExists() multiple times for the same > > "restart_lsn" (for example in case of multiple remote slots to sync). > > > > What if the sync worker records the last lsn it asks for serialization (and > > serialized ? Then we could check that value first before deciding to call > > (or not) > > SnapBuildSnapshotExists() on it? > > > > It's not ideal because it would record "only the last one" but that would be > > simple enough for now (currently there is only one sync worker so that > > scenario > > is likely to happen). > > > > Yeah, we could do that but I am not sure how much it can help. I guess > we could do some tests to see if it helps. Yeah not sure either. I just think it can only help and shouldn't make things worst (but could avoid extra SnapBuildSnapshotExists() calls). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
On Mon, Apr 1, 2024 at 10:40 AM Amit Kapila wrote: > > After this step and before the next, did you ensure that the slot sync > has synced the latest confirmed_flush/restart LSNs? You can query: > "select slot_name,restart_lsn, confirmed_flush_lsn from > pg_replication_slots;" to ensure the same on both the primary and > standby. Yes, after ensuring the slot is synced on the standby, the problem is reproduced for me and the proposed patch fixes it (i.e. able to see the changes even after the promotion). I'm just thinking if we can add a TAP test for this issue, but one key aspect of this reproducer is to not let someone write a RUNNING_XACTS WAL record on the primary in between before the standby promotion. Setting bgwriter_delay to max isn't helping me. I think we can think of using an injection point to add delay in LogStandbySnapshot() for having this problem reproduced consistently in a TAP test. Perhaps, we can think of adding this later after the fix is shipped. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
On Mon, Apr 1, 2024 at 2:51 PM Bertrand Drouvot wrote: > > Hi, > > On Mon, Apr 01, 2024 at 06:05:34AM +, Zhijie Hou (Fujitsu) wrote: > > On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu) > > wrote: > > Attach the V4 patch which includes the optimization to skip the decoding if > > the snapshot at the syncing restart_lsn is already serialized. It can avoid > > most > > of the duplicate decoding in my test, and I am doing some more tests > > locally. > > > > Thanks! > > 1 === > > Same comment as in [1]. > > In LogicalSlotAdvanceAndCheckReadynessForDecoding(), if we are synchronizing > slots > then I think that we can skip: > > + /* > +* Wait for specified streaming replication standby servers > (if any) > +* to confirm receipt of WAL up to moveto lsn. > +*/ > + WaitForStandbyConfirmation(moveto); > > Indeed if we are dealing with synced slot then we know we're in > RecoveryInProgress(). > > Then there is no need to call WaitForStandbyConfirmation() as it could go > until > the RecoveryInProgress() in StandbySlotsHaveCaughtup() for nothing (as we > already > know it). > Won't it will normally return from the first check in WaitForStandbyConfirmation() because standby_slot_names_config is not set on standby? > 2 === > > + { > + if (SnapBuildSnapshotExists(remote_slot->restart_lsn)) > + { > > That could call SnapBuildSnapshotExists() multiple times for the same > "restart_lsn" (for example in case of multiple remote slots to sync). > > What if the sync worker records the last lsn it asks for serialization (and > serialized ? Then we could check that value first before deciding to call (or > not) > SnapBuildSnapshotExists() on it? > > It's not ideal because it would record "only the last one" but that would be > simple enough for now (currently there is only one sync worker so that > scenario > is likely to happen). > Yeah, we could do that but I am not sure how much it can help. I guess we could do some tests to see if it helps. -- With Regards, Amit Kapila.
Re: Synchronizing slots from primary to standby
On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu) wrote: > > On Friday, March 29, 2024 2:50 PM Amit Kapila wrote: > > > > > > > > > 2. > > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr moveto, > > + bool *found_consistent_point); > > + > > > > This API looks a bit awkward as the functionality doesn't match the name. > > How > > about having a function with name > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto, > > ready_for_decoding) with the same functionality as your patch has for > > pg_logical_replication_slot_advance() and then invoke it both from > > pg_logical_replication_slot_advance and slotsync.c. The function name is too > > big, we can think of a shorter name. Any ideas? > > How about LogicalSlotAdvanceAndCheckDecodingState() Or just > LogicalSlotAdvanceAndCheckDecoding()? > It is about snapbuild state, so how about naming the function as LogicalSlotAdvanceAndCheckSnapState()? I have made quite a few cosmetic changes in comments and code. See attached. This is atop your latest patch. Can you please review and include these changes in the next version? -- With Regards, Amit Kapila. diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c index bbc7cdaf50..7d5884789c 100644 --- a/src/backend/replication/logical/logical.c +++ b/src/backend/replication/logical/logical.c @@ -518,10 +518,9 @@ CreateDecodingContext(XLogRecPtr start_lsn, errmsg("cannot use physical replication slot for logical decoding"))); /* -* Do not allow decoding if the replication slot belongs to a different -* database unless we are in fast-forward mode. In fast-forward mode, we -* ignore storage-level changes and do not need to access the database -* object. +* We need to access the system tables during decoding to build the logical +* changes unless we are in fast-forward mode where no changes are +* generated. */ if (slot->data.database != MyDatabaseId && !fast_forward) ereport(ERROR, @@ -530,9 +529,9 @@ CreateDecodingContext(XLogRecPtr start_lsn, NameStr(slot->data.name; /* -* Do not allow consumption of a "synchronized" slot until the standby gets -* promoted unless we are syncing replication slots, in which case we need -* to advance the LSN and xmin of the slot during decoding. +* The slots being synced from the primary can't be used for decoding as +* they are used after failover. However, we do allow advancing the LSNs +* during the synchronization of slots. See update_local_synced_slot. */ if (RecoveryInProgress() && slot->data.synced && !IsSyncingReplicationSlots()) ereport(ERROR, @@ -2054,8 +2053,8 @@ LogicalReplicationSlotHasPendingWal(XLogRecPtr end_of_wal) * WAL and removal of old catalog tuples. As decoding is done in fast_forward * mode, no changes are generated anyway. * - * *ready_for_decoding will be set to true if the logical decoding reaches - * the consistent point; Otherwise, it will be set to false. + * *ready_for_decoding will be true if the initial decoding snapshot has + * been built; Otherwise, it will be false. */ XLogRecPtr LogicalSlotAdvanceAndCheckReadynessForDecoding(XLogRecPtr moveto, diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c index 886a9fcc7e..9f6b83d486 100644 --- a/src/backend/replication/logical/slotsync.c +++ b/src/backend/replication/logical/slotsync.c @@ -26,10 +26,13 @@ * pg_sync_replication_slots() periodically to perform the syncs. * * If synchronized slots fail to build a consistent snapshot from the - * restart_lsn, they would become unreliable after promotion due to potential - * data loss from changes before reaching a consistent point. So, we mark such - * slots as RS_TEMPORARY. Once they successfully reach the consistent point, - * they will be marked to RS_PERSISTENT. + * restart_lsn before reaching confirmed_flush_lsn, they would become + * unreliable after promotion due to potential data loss from changes + * before reaching a consistent point. This can happen because the slots can + * be synced at some random time and we may not reach the consistent point + * at the same WAL location as the primary. So, we mark such slots as + * RS_TEMPORARY. Once the decoding from corresponding LSNs can reach a + * consistent point, they will be marked as RS_PERSISTENT. * * The slot sync worker waits for some time before the next synchronization, * with the duration varying based on whether any slots were updated during @@ -155,9 +158,9 @@ static void slotsync_failure_callback(int code, Datum arg); * If no update was needed (the data of the remote slot is the same as the * local slot) return false, otherwise true. * - * If the LSN of the slot is mo
Re: Synchronizing slots from primary to standby
Did performance test on optimization patch (v2-0001-optimize-the-slot-advancement.patch). Please find the results: Setup: - One primary node with 100 failover-enabled logical slots - 20 DBs, each having 5 failover-enabled logical replication slots - One physical standby node with 'sync_replication_slots' as off but other parameters required by slot-sync as enabled. Node Configurations: please see config.txt Test Plan: 1) Create 20 Databases on Primary node, each with 5 failover slots using "pg_create_logical_replication_slot()". Overall 100 failover slots. 2) Use pg_sync_replication_slot() to sync them to the standby. Note the execution time of sync and lsns values. 3) On Primary node, run pgbench for 15 mins on postgres db 4) Advance lsns of all the 100 slots on primary using pg_replication_slot_advance(). 5) Use pg_sync_replication_slot() to sync slots to the standby. Note the execution time of sync and lsns values. Executed the above test plan for three cases and did time elapsed comparison for the pg_replication_slot_advance()- (1) HEAD Time taken by pg_sync_replication_slot() on Standby node - a) The initial sync (step 2) = 140.208 ms b) Sync after pgbench run on primary (step 5) = 66.994 ms (2) HEAD + v3-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch a) The initial sync (step 2) = 163.885 ms b) Sync after pgbench run on primary (step 5) = 837901.290 ms (13:57.901) >> With v3 patch, the pg_sync_replication_slot() takes a significant amount of time to sync the slots. (3) HEAD + v3-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch + v2-0001-optimize-the-slot-advancement.patch a) The initial sync (step 2) = 165.554 ms b) Sync after pgbench run on primary (step 5) = 7991.718 ms (00:07.992) >> With the optimization patch, the time taken by pg_sync_replication_slot() is reduced significantly to ~7 seconds. We did the same test with a single DB too by creating all 100 failover slots in postgres DB and the results were almost similar. Attached the scripts used for the test - "v3_perf_test_scripts.tar.gz" include files - setup_multidb.sh : setup primary and standby nodes createdb20.sql : create 20 DBs createslot20.sql : create total 100 logical slots, 5 on each DB run_sync.sql : call pg_replication_slot_advance() with timing advance20.sql : advance lsn of all slots on Primary node to current lsn advance20_perdb.sql : use on HEAD to advance lsn on Primary node get_synced_data.sql : get details of the config.txt : configuration used for nodes v3_perf_test_scripts.tar.gz Description: GNU Zip compressed data
Re: Synchronizing slots from primary to standby
Hi, On Mon, Apr 01, 2024 at 06:05:34AM +, Zhijie Hou (Fujitsu) wrote: > On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu) > wrote: > Attach the V4 patch which includes the optimization to skip the decoding if > the snapshot at the syncing restart_lsn is already serialized. It can avoid > most > of the duplicate decoding in my test, and I am doing some more tests locally. > Thanks! 1 === Same comment as in [1]. In LogicalSlotAdvanceAndCheckReadynessForDecoding(), if we are synchronizing slots then I think that we can skip: + /* +* Wait for specified streaming replication standby servers (if any) +* to confirm receipt of WAL up to moveto lsn. +*/ + WaitForStandbyConfirmation(moveto); Indeed if we are dealing with synced slot then we know we're in RecoveryInProgress(). Then there is no need to call WaitForStandbyConfirmation() as it could go until the RecoveryInProgress() in StandbySlotsHaveCaughtup() for nothing (as we already know it). 2 === + { + if (SnapBuildSnapshotExists(remote_slot->restart_lsn)) + { That could call SnapBuildSnapshotExists() multiple times for the same "restart_lsn" (for example in case of multiple remote slots to sync). What if the sync worker records the last lsn it asks for serialization (and serialized ? Then we could check that value first before deciding to call (or not) SnapBuildSnapshotExists() on it? It's not ideal because it would record "only the last one" but that would be simple enough for now (currently there is only one sync worker so that scenario is likely to happen). Maybe an idea for future improvement (not for now) could be that SnapBuildSerialize() maintains a "small list" of "already serialized" snapshots. [1]: https://www.postgresql.org/message-id/ZgayTFIhLfzhpHci%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
On Mon, Apr 1, 2024 at 10:40 AM Amit Kapila wrote: > > On Mon, Apr 1, 2024 at 10:01 AM Bharath Rupireddy > wrote: > > > > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) > > wrote: > > > > > > [2] The steps to reproduce the data miss issue on a primary->standby > > > setup: > > > > I'm trying to reproduce the problem with [1], but I can see the > > changes after the standby is promoted. Am I missing anything here? > > > > ubuntu:~/postgres/pg17/bin$ ./psql -d postgres -p 5433 -c "select * > > from pg_logical_slot_get_changes('lrep_sync_slot', NULL, NULL);" > > lsn| xid |data > > ---+-+- > > 0/3B0 | 738 | BEGIN 738 > > 0/3017FC8 | 738 | COMMIT 738 > > 0/3017FF8 | 739 | BEGIN 739 > > 0/3019A38 | 739 | COMMIT 739 > > 0/3019A38 | 740 | BEGIN 740 > > 0/3019A38 | 740 | table public.dummy1: INSERT: a[integer]:999 > > 0/3019AA8 | 740 | COMMIT 740 > > (7 rows) > > > > [1] > > -#define LOG_SNAPSHOT_INTERVAL_MS 15000 > > +#define LOG_SNAPSHOT_INTERVAL_MS 150 > > > > ./initdb -D db17 > > echo "archive_mode = on > > archive_command='cp %p /home/ubuntu/postgres/pg17/bin/archived_wal/%f' > > wal_level='logical' > > autovacuum = off > > checkpoint_timeout='1h'" | tee -a db17/postgresql.conf > > > > ./pg_ctl -D db17 -l logfile17 start > > > > rm -rf sbdata logfilesbdata > > ./pg_basebackup -D sbdata > > > > ./psql -d postgres -p 5432 -c "SELECT > > pg_create_logical_replication_slot('lrep_sync_slot', 'test_decoding', > > false, false, true);" > > ./psql -d postgres -p 5432 -c "SELECT > > pg_create_physical_replication_slot('phy_repl_slot', true, false);" > > > > echo "port=5433 > > primary_conninfo='host=localhost port=5432 dbname=postgres user=ubuntu' > > primary_slot_name='phy_repl_slot' > > restore_command='cp /home/ubuntu/postgres/pg17/bin/archived_wal/%f %p' > > hot_standby_feedback=on > > sync_replication_slots=on" | tee -a sbdata/postgresql.conf > > > > touch sbdata/standby.signal > > > > ./pg_ctl -D sbdata -l logfilesbdata start > > ./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();" > > > > ./psql -d postgres > > > > SESSION1, TXN1 > > BEGIN; > > create table dummy1(a int); > > > > SESSION2, TXN2 > > SELECT pg_log_standby_snapshot(); > > > > SESSION1, TXN1 > > COMMIT; > > > > SESSION1, TXN1 > > BEGIN; > > create table dummy2(a int); > > > > SESSION2, TXN2 > > SELECT pg_log_standby_snapshot(); > > > > SESSION1, TXN1 > > COMMIT; > > > > ./psql -d postgres -p 5432 -c "SELECT > > pg_replication_slot_advance('lrep_sync_slot', pg_current_wal_lsn());" > > > > After this step and before the next, did you ensure that the slot sync > has synced the latest confirmed_flush/restart LSNs? You can query: > "select slot_name,restart_lsn, confirmed_flush_lsn from > pg_replication_slots;" to ensure the same on both the primary and > standby. +1. To ensure last sync, one can run this manually on standby just before promotion : SELECT pg_sync_replication_slots(); thanks Shveta
RE: Synchronizing slots from primary to standby
On Monday, April 1, 2024 8:56 AM Zhijie Hou (Fujitsu) wrote: > > On Friday, March 29, 2024 2:50 PM Amit Kapila > wrote: > > > > On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu) > > > > wrote: > > > > > > > > > Attach a new version patch which fixed an un-initialized variable > > > issue and added some comments. > > > > > > > The other approach to fix this issue could be that the slotsync worker > > get the serialized snapshot using pg_read_binary_file() corresponding > > to restart_lsn and writes those at standby. But there are cases when > > we won't have such a file like (a) when we initially create the slot > > and reach the consistent_point, or (b) also by the time the slotsync > > worker starts to read the remote snapshot file, the snapshot file > > could have been removed by the checkpointer on the primary (if the > > restart_lsn of the remote has been advanced in this window). So, in > > such cases, we anyway need to advance the slot. I think these could be > optimizations that we could do in the future. > > > > Few comments: > > Thanks for the comments. > > > = > > 1. > > - if (slot->data.database != MyDatabaseId) > > + if (slot->data.database != MyDatabaseId && !fast_forward) > > ereport(ERROR, > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > errmsg("replication slot \"%s\" was not created in this database", > > @@ -526,7 > > +527,7 @@ CreateDecodingContext(XLogRecPtr start_lsn, > > * Do not allow consumption of a "synchronized" slot until the standby > > * gets promoted. > > */ > > - if (RecoveryInProgress() && slot->data.synced) > > + if (RecoveryInProgress() && slot->data.synced && > > + !IsSyncingReplicationSlots()) > > > > > > Add comments at both of the above places. > > Added. > > > > > > > 2. > > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr > moveto, > > + bool *found_consistent_point); > > + > > > > This API looks a bit awkward as the functionality doesn't match the > > name. How about having a function with name > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto, > > ready_for_decoding) with the same functionality as your patch has for > > pg_logical_replication_slot_advance() and then invoke it both from > > pg_logical_replication_slot_advance and slotsync.c. The function name > > is too big, we can think of a shorter name. Any ideas? > > How about LogicalSlotAdvanceAndCheckDecodingState() Or just > LogicalSlotAdvanceAndCheckDecoding()? (I used the suggested > LogicalSlotAdvanceAndCheckReadynessForDecoding in this version, It can be > renamed in next version if we agree). > > Attach the V3 patch which addressed above comments and Kuroda-san's > comments[1]. I also adjusted the tap-test to only check the > confirmed_flush_lsn > after syncing, as the restart_lsn could be different from the remote one due > to > the new slot_advance() call. I am also testing some optimization idea locally > and > will share if ready. Attach the V4 patch which includes the optimization to skip the decoding if the snapshot at the syncing restart_lsn is already serialized. It can avoid most of the duplicate decoding in my test, and I am doing some more tests locally. Best Regards, Hou zj v4-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch Description: v4-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Re: Synchronizing slots from primary to standby
On Mon, Apr 1, 2024 at 10:01 AM Bharath Rupireddy wrote: > > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) > wrote: > > > > [2] The steps to reproduce the data miss issue on a primary->standby setup: > > I'm trying to reproduce the problem with [1], but I can see the > changes after the standby is promoted. Am I missing anything here? > > ubuntu:~/postgres/pg17/bin$ ./psql -d postgres -p 5433 -c "select * > from pg_logical_slot_get_changes('lrep_sync_slot', NULL, NULL);" > lsn| xid |data > ---+-+- > 0/3B0 | 738 | BEGIN 738 > 0/3017FC8 | 738 | COMMIT 738 > 0/3017FF8 | 739 | BEGIN 739 > 0/3019A38 | 739 | COMMIT 739 > 0/3019A38 | 740 | BEGIN 740 > 0/3019A38 | 740 | table public.dummy1: INSERT: a[integer]:999 > 0/3019AA8 | 740 | COMMIT 740 > (7 rows) > > [1] > -#define LOG_SNAPSHOT_INTERVAL_MS 15000 > +#define LOG_SNAPSHOT_INTERVAL_MS 150 > > ./initdb -D db17 > echo "archive_mode = on > archive_command='cp %p /home/ubuntu/postgres/pg17/bin/archived_wal/%f' > wal_level='logical' > autovacuum = off > checkpoint_timeout='1h'" | tee -a db17/postgresql.conf > > ./pg_ctl -D db17 -l logfile17 start > > rm -rf sbdata logfilesbdata > ./pg_basebackup -D sbdata > > ./psql -d postgres -p 5432 -c "SELECT > pg_create_logical_replication_slot('lrep_sync_slot', 'test_decoding', > false, false, true);" > ./psql -d postgres -p 5432 -c "SELECT > pg_create_physical_replication_slot('phy_repl_slot', true, false);" > > echo "port=5433 > primary_conninfo='host=localhost port=5432 dbname=postgres user=ubuntu' > primary_slot_name='phy_repl_slot' > restore_command='cp /home/ubuntu/postgres/pg17/bin/archived_wal/%f %p' > hot_standby_feedback=on > sync_replication_slots=on" | tee -a sbdata/postgresql.conf > > touch sbdata/standby.signal > > ./pg_ctl -D sbdata -l logfilesbdata start > ./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();" > > ./psql -d postgres > > SESSION1, TXN1 > BEGIN; > create table dummy1(a int); > > SESSION2, TXN2 > SELECT pg_log_standby_snapshot(); > > SESSION1, TXN1 > COMMIT; > > SESSION1, TXN1 > BEGIN; > create table dummy2(a int); > > SESSION2, TXN2 > SELECT pg_log_standby_snapshot(); > > SESSION1, TXN1 > COMMIT; > > ./psql -d postgres -p 5432 -c "SELECT > pg_replication_slot_advance('lrep_sync_slot', pg_current_wal_lsn());" > After this step and before the next, did you ensure that the slot sync has synced the latest confirmed_flush/restart LSNs? You can query: "select slot_name,restart_lsn, confirmed_flush_lsn from pg_replication_slots;" to ensure the same on both the primary and standby. -- With Regards, Amit Kapila.
Re: Synchronizing slots from primary to standby
On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) wrote: > > [2] The steps to reproduce the data miss issue on a primary->standby setup: I'm trying to reproduce the problem with [1], but I can see the changes after the standby is promoted. Am I missing anything here? ubuntu:~/postgres/pg17/bin$ ./psql -d postgres -p 5433 -c "select * from pg_logical_slot_get_changes('lrep_sync_slot', NULL, NULL);" lsn| xid |data ---+-+- 0/3B0 | 738 | BEGIN 738 0/3017FC8 | 738 | COMMIT 738 0/3017FF8 | 739 | BEGIN 739 0/3019A38 | 739 | COMMIT 739 0/3019A38 | 740 | BEGIN 740 0/3019A38 | 740 | table public.dummy1: INSERT: a[integer]:999 0/3019AA8 | 740 | COMMIT 740 (7 rows) [1] -#define LOG_SNAPSHOT_INTERVAL_MS 15000 +#define LOG_SNAPSHOT_INTERVAL_MS 150 ./initdb -D db17 echo "archive_mode = on archive_command='cp %p /home/ubuntu/postgres/pg17/bin/archived_wal/%f' wal_level='logical' autovacuum = off checkpoint_timeout='1h'" | tee -a db17/postgresql.conf ./pg_ctl -D db17 -l logfile17 start rm -rf sbdata logfilesbdata ./pg_basebackup -D sbdata ./psql -d postgres -p 5432 -c "SELECT pg_create_logical_replication_slot('lrep_sync_slot', 'test_decoding', false, false, true);" ./psql -d postgres -p 5432 -c "SELECT pg_create_physical_replication_slot('phy_repl_slot', true, false);" echo "port=5433 primary_conninfo='host=localhost port=5432 dbname=postgres user=ubuntu' primary_slot_name='phy_repl_slot' restore_command='cp /home/ubuntu/postgres/pg17/bin/archived_wal/%f %p' hot_standby_feedback=on sync_replication_slots=on" | tee -a sbdata/postgresql.conf touch sbdata/standby.signal ./pg_ctl -D sbdata -l logfilesbdata start ./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();" ./psql -d postgres SESSION1, TXN1 BEGIN; create table dummy1(a int); SESSION2, TXN2 SELECT pg_log_standby_snapshot(); SESSION1, TXN1 COMMIT; SESSION1, TXN1 BEGIN; create table dummy2(a int); SESSION2, TXN2 SELECT pg_log_standby_snapshot(); SESSION1, TXN1 COMMIT; ./psql -d postgres -p 5432 -c "SELECT pg_replication_slot_advance('lrep_sync_slot', pg_current_wal_lsn());" ./psql -d postgres -p 5432 -c "INSERT INTO dummy1 VALUES(999);" ./psql -d postgres -p 5433 -c "SELECT pg_promote();" ./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();" ./psql -d postgres -p 5433 -c "select * from pg_logical_slot_get_changes('lrep_sync_slot', NULL, NULL);" -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
RE: Synchronizing slots from primary to standby
On Friday, March 29, 2024 2:50 PM Amit Kapila wrote: > > On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu) > wrote: > > > > > > Attach a new version patch which fixed an un-initialized variable > > issue and added some comments. > > > > The other approach to fix this issue could be that the slotsync worker get the > serialized snapshot using pg_read_binary_file() corresponding to restart_lsn > and writes those at standby. But there are cases when we won't have such a > file > like (a) when we initially create the slot and reach the consistent_point, or > (b) > also by the time the slotsync worker starts to read the remote snapshot file, > the > snapshot file could have been removed by the checkpointer on the primary (if > the restart_lsn of the remote has been advanced in this window). So, in such > cases, we anyway need to advance the slot. I think these could be > optimizations > that we could do in the future. > > Few comments: Thanks for the comments. > = > 1. > - if (slot->data.database != MyDatabaseId) > + if (slot->data.database != MyDatabaseId && !fast_forward) > ereport(ERROR, > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > errmsg("replication slot \"%s\" was not created in this database", @@ -526,7 > +527,7 @@ CreateDecodingContext(XLogRecPtr start_lsn, > * Do not allow consumption of a "synchronized" slot until the standby > * gets promoted. > */ > - if (RecoveryInProgress() && slot->data.synced) > + if (RecoveryInProgress() && slot->data.synced && > + !IsSyncingReplicationSlots()) > > > Add comments at both of the above places. Added. > > > 2. > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr moveto, > + bool *found_consistent_point); > + > > This API looks a bit awkward as the functionality doesn't match the name. How > about having a function with name > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto, > ready_for_decoding) with the same functionality as your patch has for > pg_logical_replication_slot_advance() and then invoke it both from > pg_logical_replication_slot_advance and slotsync.c. The function name is too > big, we can think of a shorter name. Any ideas? How about LogicalSlotAdvanceAndCheckDecodingState() Or just LogicalSlotAdvanceAndCheckDecoding()? (I used the suggested LogicalSlotAdvanceAndCheckReadynessForDecoding in this version, It can be renamed in next version if we agree). Attach the V3 patch which addressed above comments and Kuroda-san's comments[1]. I also adjusted the tap-test to only check the confirmed_flush_lsn after syncing, as the restart_lsn could be different from the remote one due to the new slot_advance() call. I am also testing some optimization idea locally and will share if ready. [1] https://www.postgresql.org/message-id/TYCPR01MB1207757BB2A32B6815CE1CCE7F53A2%40TYCPR01MB12077.jpnprd01.prod.outlook.com Best Regards, Hou zj v3-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch Description: v3-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
Re: Synchronizing slots from primary to standby
Hi, On Fri, Mar 29, 2024 at 02:35:22PM +0530, Amit Kapila wrote: > On Fri, Mar 29, 2024 at 1:08 PM Bertrand Drouvot > wrote: > > > > On Fri, Mar 29, 2024 at 07:23:11AM +, Zhijie Hou (Fujitsu) wrote: > > > On Friday, March 29, 2024 2:48 PM Bertrand Drouvot > > > wrote: > > > > > > > > Hi, > > > > > > > > On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote: > > > > > Attach a new version patch which fixed an un-initialized variable > > > > > issue and added some comments. Also, temporarily enable DEBUG2 for the > > > > > 040 tap-test so that we can analyze the possible CFbot failures > > > > > easily. > > > > > > > > > > > > > Thanks! > > > > > > > > + if (remote_slot->confirmed_lsn != slot->data.confirmed_flush) > > > > + { > > > > + /* > > > > +* By advancing the restart_lsn, confirmed_lsn, and > > > > xmin using > > > > +* fast-forward logical decoding, we ensure that the > > > > required > > > > snapshots > > > > +* are saved to disk. This enables logical decoding to > > > > quickly > > > > reach a > > > > +* consistent point at the restart_lsn, eliminating the > > > > risk of > > > > missing > > > > +* data during snapshot creation. > > > > +*/ > > > > + > > > > pg_logical_replication_slot_advance(remote_slot->confirmed_lsn, > > > > + > > > > found_consistent_point); > > > > + ReplicationSlotsComputeRequiredLSN(); > > > > + updated_lsn = true; > > > > + } > > > > > > > > Instead of using pg_logical_replication_slot_advance() for each synced > > > > slot and > > > > during sync cycles what about?: > > > > > > > > - keep sync slot synchronization as it is currently (not using > > > > pg_logical_replication_slot_advance()) > > > > - create "an hidden" logical slot if sync slot feature is on > > > > - at the time of promotion use pg_logical_replication_slot_advance() on > > > > this > > > > hidden slot only to advance to the max lsn of the synced slots > > > > > > > > I'm not sure that would be enough, just asking your thoughts on this > > > > (benefits > > > > would be to avoid calling pg_logical_replication_slot_advance() on each > > > > sync > > > > slots and during the sync cycles). > > > > > > Thanks for the idea ! > > > > > > I considered about this. I think advancing the "hidden" slot on promotion > > > may be a > > > bit late, because if we cannot reach the consistent point after advancing > > > the > > > "hidden" slot, then it means we may need to remove all the synced slots > > > as we > > > are not sure if they are usable(will not loss data) after promotion. > > > > What about advancing the hidden slot during the sync cycles then? > > > > > The current approach is to mark such un-consistent slot as temp and > > > persist > > > them once it reaches consistent point, so that user can ensure the slot > > > can be > > > used after promotion once persisted. > > > > Right, but do we need to do so for all the sync slots? Would a single hidden > > slot be enough? > > > > Even if we mark one of the synced slots as persistent without reaching > a consistent state, it could create a problem after promotion. And, > how a single hidden slot would serve the purpose, different synced > slots will have different restart/confirmed_flush LSN and we won't be > able to perform advancing for those using a single slot. For example, > say for first synced slot, it has not reached a consistent state and > then how can it try for the second slot? This sounds quite tricky to > make work. We should go with something simple where the chances of > introducing bugs are lesser. Yeah, better to go with something simple. + if (remote_slot->confirmed_lsn != slot->data.confirmed_flush) + { + /* +* By advancing the restart_lsn, confirmed_lsn, and xmin using +* fast-forward logical decoding, we ensure that the required snapshots +* are saved to disk. This enables logical decoding to quickly reach a +* consistent point at the restart_lsn, eliminating the risk of missing +* data during snapshot creation. +*/ + pg_logical_replication_slot_advance(remote_slot->confirmed_lsn, + found_consistent_point); In our case, what about skipping WaitForStandbyConfirmation() in pg_logical_replication_slot_advance()? (It could go until the RecoveryInProgress() check in StandbySlotsHaveCaughtup() if we don't skip it). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
On Fri, Mar 29, 2024 at 1:08 PM Bertrand Drouvot wrote: > > On Fri, Mar 29, 2024 at 07:23:11AM +, Zhijie Hou (Fujitsu) wrote: > > On Friday, March 29, 2024 2:48 PM Bertrand Drouvot > > wrote: > > > > > > Hi, > > > > > > On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote: > > > > Attach a new version patch which fixed an un-initialized variable > > > > issue and added some comments. Also, temporarily enable DEBUG2 for the > > > > 040 tap-test so that we can analyze the possible CFbot failures easily. > > > > > > > > > > Thanks! > > > > > > + if (remote_slot->confirmed_lsn != slot->data.confirmed_flush) > > > + { > > > + /* > > > +* By advancing the restart_lsn, confirmed_lsn, and xmin > > > using > > > +* fast-forward logical decoding, we ensure that the > > > required > > > snapshots > > > +* are saved to disk. This enables logical decoding to > > > quickly > > > reach a > > > +* consistent point at the restart_lsn, eliminating the > > > risk of > > > missing > > > +* data during snapshot creation. > > > +*/ > > > + > > > pg_logical_replication_slot_advance(remote_slot->confirmed_lsn, > > > + > > > found_consistent_point); > > > + ReplicationSlotsComputeRequiredLSN(); > > > + updated_lsn = true; > > > + } > > > > > > Instead of using pg_logical_replication_slot_advance() for each synced > > > slot and > > > during sync cycles what about?: > > > > > > - keep sync slot synchronization as it is currently (not using > > > pg_logical_replication_slot_advance()) > > > - create "an hidden" logical slot if sync slot feature is on > > > - at the time of promotion use pg_logical_replication_slot_advance() on > > > this > > > hidden slot only to advance to the max lsn of the synced slots > > > > > > I'm not sure that would be enough, just asking your thoughts on this > > > (benefits > > > would be to avoid calling pg_logical_replication_slot_advance() on each > > > sync > > > slots and during the sync cycles). > > > > Thanks for the idea ! > > > > I considered about this. I think advancing the "hidden" slot on promotion > > may be a > > bit late, because if we cannot reach the consistent point after advancing > > the > > "hidden" slot, then it means we may need to remove all the synced slots as > > we > > are not sure if they are usable(will not loss data) after promotion. > > What about advancing the hidden slot during the sync cycles then? > > > The current approach is to mark such un-consistent slot as temp and persist > > them once it reaches consistent point, so that user can ensure the slot can > > be > > used after promotion once persisted. > > Right, but do we need to do so for all the sync slots? Would a single hidden > slot be enough? > Even if we mark one of the synced slots as persistent without reaching a consistent state, it could create a problem after promotion. And, how a single hidden slot would serve the purpose, different synced slots will have different restart/confirmed_flush LSN and we won't be able to perform advancing for those using a single slot. For example, say for first synced slot, it has not reached a consistent state and then how can it try for the second slot? This sounds quite tricky to make work. We should go with something simple where the chances of introducing bugs are lesser. -- With Regards, Amit Kapila.
Re: Synchronizing slots from primary to standby
On Fri, Mar 29, 2024 at 9:34 AM Hayato Kuroda (Fujitsu) wrote: > > Thanks for updating the patch! Here is a comment for it. > > ``` > +/* > + * By advancing the restart_lsn, confirmed_lsn, and xmin using > + * fast-forward logical decoding, we can verify whether a consistent > + * snapshot can be built. This process also involves saving necessary > + * snapshots to disk during decoding, ensuring that logical decoding > + * efficiently reaches a consistent point at the restart_lsn without > + * the potential loss of data during snapshot creation. > + */ > +pg_logical_replication_slot_advance(remote_slot->confirmed_lsn, > +found_consistent_point); > +ReplicationSlotsComputeRequiredLSN(); > +updated_lsn = true; > ``` > > You added them like pg_replication_slot_advance(), but the function also calls > ReplicationSlotsComputeRequiredXmin(false) at that time. According to the > related > commit b48df81 and discussions [1], I know it is needed only for physical > slots, > but it makes more consistent to call requiredXmin() as well, per [2]: > Yeah, I also think it is okay to call for the sake of consistency with pg_replication_slot_advance(). -- With Regards, Amit Kapila.
Re: Synchronizing slots from primary to standby
Hi, On Fri, Mar 29, 2024 at 07:23:11AM +, Zhijie Hou (Fujitsu) wrote: > On Friday, March 29, 2024 2:48 PM Bertrand Drouvot > wrote: > > > > Hi, > > > > On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote: > > > Attach a new version patch which fixed an un-initialized variable > > > issue and added some comments. Also, temporarily enable DEBUG2 for the > > > 040 tap-test so that we can analyze the possible CFbot failures easily. > > > > > > > Thanks! > > > > + if (remote_slot->confirmed_lsn != slot->data.confirmed_flush) > > + { > > + /* > > +* By advancing the restart_lsn, confirmed_lsn, and xmin > > using > > +* fast-forward logical decoding, we ensure that the > > required > > snapshots > > +* are saved to disk. This enables logical decoding to > > quickly > > reach a > > +* consistent point at the restart_lsn, eliminating the > > risk of > > missing > > +* data during snapshot creation. > > +*/ > > + > > pg_logical_replication_slot_advance(remote_slot->confirmed_lsn, > > + > > found_consistent_point); > > + ReplicationSlotsComputeRequiredLSN(); > > + updated_lsn = true; > > + } > > > > Instead of using pg_logical_replication_slot_advance() for each synced slot > > and > > during sync cycles what about?: > > > > - keep sync slot synchronization as it is currently (not using > > pg_logical_replication_slot_advance()) > > - create "an hidden" logical slot if sync slot feature is on > > - at the time of promotion use pg_logical_replication_slot_advance() on this > > hidden slot only to advance to the max lsn of the synced slots > > > > I'm not sure that would be enough, just asking your thoughts on this > > (benefits > > would be to avoid calling pg_logical_replication_slot_advance() on each sync > > slots and during the sync cycles). > > Thanks for the idea ! > > I considered about this. I think advancing the "hidden" slot on promotion may > be a > bit late, because if we cannot reach the consistent point after advancing the > "hidden" slot, then it means we may need to remove all the synced slots as we > are not sure if they are usable(will not loss data) after promotion. What about advancing the hidden slot during the sync cycles then? > The current approach is to mark such un-consistent slot as temp and persist > them once it reaches consistent point, so that user can ensure the slot can be > used after promotion once persisted. Right, but do we need to do so for all the sync slots? Would a single hidden slot be enough? > Another optimization idea is to check the snapshot file existence before > calling the > slot_advance(). If the file already exists, we skip the decoding and directly > update the restart_lsn. This way, we could also avoid some duplicate decoding > work. Yeah, I think it's a good idea (even better if we can do this check without performing any I/O). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
RE: Synchronizing slots from primary to standby
On Friday, March 29, 2024 2:48 PM Bertrand Drouvot wrote: > > Hi, > > On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote: > > Attach a new version patch which fixed an un-initialized variable > > issue and added some comments. Also, temporarily enable DEBUG2 for the > > 040 tap-test so that we can analyze the possible CFbot failures easily. > > > > Thanks! > > + if (remote_slot->confirmed_lsn != slot->data.confirmed_flush) > + { > + /* > +* By advancing the restart_lsn, confirmed_lsn, and xmin using > +* fast-forward logical decoding, we ensure that the required > snapshots > +* are saved to disk. This enables logical decoding to quickly > reach a > +* consistent point at the restart_lsn, eliminating the risk > of > missing > +* data during snapshot creation. > +*/ > + > pg_logical_replication_slot_advance(remote_slot->confirmed_lsn, > + > found_consistent_point); > + ReplicationSlotsComputeRequiredLSN(); > + updated_lsn = true; > + } > > Instead of using pg_logical_replication_slot_advance() for each synced slot > and > during sync cycles what about?: > > - keep sync slot synchronization as it is currently (not using > pg_logical_replication_slot_advance()) > - create "an hidden" logical slot if sync slot feature is on > - at the time of promotion use pg_logical_replication_slot_advance() on this > hidden slot only to advance to the max lsn of the synced slots > > I'm not sure that would be enough, just asking your thoughts on this (benefits > would be to avoid calling pg_logical_replication_slot_advance() on each sync > slots and during the sync cycles). Thanks for the idea ! I considered about this. I think advancing the "hidden" slot on promotion may be a bit late, because if we cannot reach the consistent point after advancing the "hidden" slot, then it means we may need to remove all the synced slots as we are not sure if they are usable(will not loss data) after promotion. And it may confuse user a bit as they have seen these slots to be sync-ready. The current approach is to mark such un-consistent slot as temp and persist them once it reaches consistent point, so that user can ensure the slot can be used after promotion once persisted. Another optimization idea is to check the snapshot file existence before calling the slot_advance(). If the file already exists, we skip the decoding and directly update the restart_lsn. This way, we could also avoid some duplicate decoding work. Best Regards, Hou zj
Re: Synchronizing slots from primary to standby
On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu) wrote: > > > Attach a new version patch which fixed an un-initialized variable issue and > added some comments. > The other approach to fix this issue could be that the slotsync worker get the serialized snapshot using pg_read_binary_file() corresponding to restart_lsn and writes those at standby. But there are cases when we won't have such a file like (a) when we initially create the slot and reach the consistent_point, or (b) also by the time the slotsync worker starts to read the remote snapshot file, the snapshot file could have been removed by the checkpointer on the primary (if the restart_lsn of the remote has been advanced in this window). So, in such cases, we anyway need to advance the slot. I think these could be optimizations that we could do in the future. Few comments: = 1. - if (slot->data.database != MyDatabaseId) + if (slot->data.database != MyDatabaseId && !fast_forward) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("replication slot \"%s\" was not created in this database", @@ -526,7 +527,7 @@ CreateDecodingContext(XLogRecPtr start_lsn, * Do not allow consumption of a "synchronized" slot until the standby * gets promoted. */ - if (RecoveryInProgress() && slot->data.synced) + if (RecoveryInProgress() && slot->data.synced && !IsSyncingReplicationSlots()) Add comments at both of the above places. 2. +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr moveto, + bool *found_consistent_point); + This API looks a bit awkward as the functionality doesn't match the name. How about having a function with name LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto, ready_for_decoding) with the same functionality as your patch has for pg_logical_replication_slot_advance() and then invoke it both from pg_logical_replication_slot_advance and slotsync.c. The function name is too big, we can think of a shorter name. Any ideas? -- With Regards, Amit Kapila.
Re: Synchronizing slots from primary to standby
Hi, On Fri, Mar 29, 2024 at 01:06:15AM +, Zhijie Hou (Fujitsu) wrote: > Attach a new version patch which fixed an un-initialized variable issue and > added some comments. Also, temporarily enable DEBUG2 for the 040 tap-test so > that > we can analyze the possible CFbot failures easily. > Thanks! + if (remote_slot->confirmed_lsn != slot->data.confirmed_flush) + { + /* +* By advancing the restart_lsn, confirmed_lsn, and xmin using +* fast-forward logical decoding, we ensure that the required snapshots +* are saved to disk. This enables logical decoding to quickly reach a +* consistent point at the restart_lsn, eliminating the risk of missing +* data during snapshot creation. +*/ + pg_logical_replication_slot_advance(remote_slot->confirmed_lsn, + found_consistent_point); + ReplicationSlotsComputeRequiredLSN(); + updated_lsn = true; + } Instead of using pg_logical_replication_slot_advance() for each synced slot and during sync cycles what about?: - keep sync slot synchronization as it is currently (not using pg_logical_replication_slot_advance()) - create "an hidden" logical slot if sync slot feature is on - at the time of promotion use pg_logical_replication_slot_advance() on this hidden slot only to advance to the max lsn of the synced slots I'm not sure that would be enough, just asking your thoughts on this (benefits would be to avoid calling pg_logical_replication_slot_advance() on each sync slots and during the sync cycles). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
RE: Synchronizing slots from primary to standby
Dear Hou, Thanks for updating the patch! Here is a comment for it. ``` +/* + * By advancing the restart_lsn, confirmed_lsn, and xmin using + * fast-forward logical decoding, we can verify whether a consistent + * snapshot can be built. This process also involves saving necessary + * snapshots to disk during decoding, ensuring that logical decoding + * efficiently reaches a consistent point at the restart_lsn without + * the potential loss of data during snapshot creation. + */ +pg_logical_replication_slot_advance(remote_slot->confirmed_lsn, +found_consistent_point); +ReplicationSlotsComputeRequiredLSN(); +updated_lsn = true; ``` You added them like pg_replication_slot_advance(), but the function also calls ReplicationSlotsComputeRequiredXmin(false) at that time. According to the related commit b48df81 and discussions [1], I know it is needed only for physical slots, but it makes more consistent to call requiredXmin() as well, per [2]: ``` This may be a waste if no advancing is done, but it could also be an advantage to enforce a recalculation of the thresholds for each function call. And that's more consistent with the slot copy, drop and creation. ``` How do you think? [1]: https://www.postgresql.org/message-id/20200609171904.kpltxxvjzislidks%40alap3.anarazel.de [2]: https://www.postgresql.org/message-id/20200616072727.GA2361%40paquier.xyz Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: Synchronizing slots from primary to standby
On Fri, Mar 29, 2024 at 6:36 AM Zhijie Hou (Fujitsu) wrote: > > Attach a new version patch which fixed an un-initialized variable issue and > added some comments. Also, temporarily enable DEBUG2 for the 040 tap-test so > that > we can analyze the possible CFbot failures easily. As suggested by Amit in [1], for the fix being discussed where we need to advance the synced slot on standby, we need to skip the dbid check in fast_forward mode in CreateDecodingContext(). We tried few tests to make sure that there was no table-access done during fast-forward mode 1) Initially we tried avoiding database-id check in CreateDecodingContext() only when called by pg_logical_replication_slot_advance(). 'make check-world' passed on HEAD for the same. 2) But the more generic solution was to skip the database check if "fast_forward" is true. It was tried and 'make check-world' passed on HEAD for that as well. 3) Another thing tried by Hou-San was to run pgbench after skipping db check in the fast_forward logical decoding case. pgbench was run to generate some changes and then the logical slot was advanced to the latest position in another database. A LOG was added in relation_open to catch table access. It was found that there was no table-access in fast forward logical decoding i.e. no LOGS for table-open were generated during the test. Steps given at [2] [1]: https://www.postgresql.org/message-id/CAA4eK1KMiKangJa4NH_K1oFc87Y01n3rnpuwYagT59Y%3DADW8Dw%40mail.gmail.com [2]: -- 1. apply the DEBUG patch (attached as .txt) which will log the relation open and table cache access. 2. create a slot: SELECT 'init' FROM pg_create_logical_replication_slot('logicalslot', 'test_decoding', false, false, true); 3. run pgbench to generate some data. pgbench -i postgres pgbench --aggregate-interval=5 --time=5 --client=10 --log --rate=1000 --latency-limit=10 --failures-detailed --max-tries=10 postgres 4. start a fresh session in a different db and advance the slot to the latest position. There should be no relation open or CatCache log between the LOG "starting logical decoding for slot .." and LOG "decoding over". SELECT pg_replication_slot_advance('logicalslot', pg_current_wal_lsn()); -- thanks Shveta From 5386894faa14c0de9854e0eee9679f8eea775f65 Mon Sep 17 00:00:00 2001 From: Hou Zhijie Date: Fri, 29 Mar 2024 11:46:36 +0800 Subject: [PATCH] debug log --- src/backend/access/common/relation.c | 2 ++ src/backend/replication/slotfuncs.c | 1 + src/backend/utils/cache/catcache.c | 1 + 3 files changed, 4 insertions(+) diff --git a/src/backend/access/common/relation.c b/src/backend/access/common/relation.c index d8a313a2c9..40718fc47e 100644 --- a/src/backend/access/common/relation.c +++ b/src/backend/access/common/relation.c @@ -50,6 +50,7 @@ relation_open(Oid relationId, LOCKMODE lockmode) Assert(lockmode >= NoLock && lockmode < MAX_LOCKMODES); + elog(LOG, "relation_open"); /* Get the lock before trying to open the relcache entry */ if (lockmode != NoLock) LockRelationOid(relationId, lockmode); @@ -91,6 +92,7 @@ try_relation_open(Oid relationId, LOCKMODE lockmode) Assert(lockmode >= NoLock && lockmode < MAX_LOCKMODES); + elog(LOG, "try_relation_open"); /* Get the lock first */ if (lockmode != NoLock) LockRelationOid(relationId, lockmode); diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c index ef5081784c..564b36fc45 100644 --- a/src/backend/replication/slotfuncs.c +++ b/src/backend/replication/slotfuncs.c @@ -608,6 +608,7 @@ pg_logical_replication_slot_advance(XLogRecPtr moveto, /* free context, call shutdown callback */ FreeDecodingContext(ctx); + elog(LOG, "decoding over"); InvalidateSystemCaches(); } diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index 569f51cb33..e19c586697 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -1328,6 +1328,7 @@ SearchCatCacheInternal(CatCache *cache, Assert(cache->cc_nkeys == nkeys); + elog(LOG, "SearchCatCacheInternal"); /* * one-time startup overhead for each cache */ -- 2.31.1
RE: Synchronizing slots from primary to standby
On Thursday, March 28, 2024 10:02 PM Zhijie Hou (Fujitsu) wrote: > > On Thursday, March 28, 2024 7:32 PM Amit Kapila > wrote: > > > > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) > > > > wrote: > > > > > > When analyzing one BF error[1], we find an issue of slotsync: Since > > > we don't perform logical decoding for the synced slots when syncing > > > the lsn/xmin of slot, no logical snapshots will be serialized to > > > disk. So, when user starts to use these synced slots after > > > promotion, it needs to re-build the consistent snapshot from the > > > restart_lsn if the > > > WAL(xl_running_xacts) at restart_lsn position indicates that there > > > are running transactions. This however could cause the data that > > > before the > > consistent point to be missed[2]. > > > > > > This issue doesn't exist on the primary because the snapshot at > > > restart_lsn should have been serialized to disk > > > (SnapBuildProcessRunningXacts -> SnapBuildSerialize), so even if the > > > logical decoding restarts, it can find consistent snapshot > > > immediately at > > restart_lsn. > > > > > > To fix this, we could use the fast forward logical decoding to > > > advance the synced slot's lsn/xmin when syncing these values instead > > > of directly updating the slot's info. This way, the snapshot will be > > > serialized to > > disk when decoding. > > > If we could not reach to the consistent point at the remote > > > restart_lsn, the slot is marked as temp and will be persisted once > > > it reaches the consistent point. I am still analyzing the fix and > > > will share once > > ready. > > > > > > > Yes, we can use this but one thing to note is that > > CreateDecodingContext() will expect that the slot's and current > > database are the same. I think the reason for that is we need to check > > system tables of the current database while decoding and sending data > > to the output_plugin which won't be a requirement for the fast_forward > > case. So, we need to skip that check in fast_forward mode. > > Agreed. > > > > > Next, I was thinking about the case of the first time updating the > > restart and confirmed_flush LSN while syncing the slots. I think we > > can keep the current logic as it is based on the following analysis. > > > > For each logical slot, cases possible on the primary: > > 1. The restart_lsn doesn't have a serialized snapshot and hasn't yet > > reached the consistent point. > > 2. The restart_lsn doesn't have a serialized snapshot but has reached > > a consistent point. > > 3. The restart_lsn has a serialized snapshot which means it has > > reached a consistent point as well. > > > > Considering we keep the logic to reserve initial WAL positions the > > same as the current (Reserve WAL for the currently active local slot > > using the specified WAL location (restart_lsn). If the given WAL > > location has been removed, reserve WAL using the oldest existing WAL > > segment.), I could think of the below > > scenarios: > > A. For 1, we shouldn't sync the slot as it still wouldn't have been > > marked persistent on the primary. > > B. For 2, we would sync the slot > >B1. If remote_restart_lsn >= local_resart_lsn, then advance the > > slot by calling pg_logical_replication_slot_advance(). > >B11. If we reach consistent point, then it should be okay > > because after promotion as well we should reach consistent point. > > B111. But again is it possible that there is some xact > > that comes before consistent_point on primary and the same is after > > consistent_point on standby? This shouldn't matter as we will start > > decoding transactions after confirmed_flush_lsn which would be the same on > primary and standby. > >B22. If we haven't reached consistent_point, then we won't mark > > the slot as persistent, and at the next sync we will do the same till > > it reaches consistent_point. At that time, the situation will be similar to > > B11. > >B2. If remote_restart_lsn < local_restart_lsn, then we will wait > > for the next sync cycle and keep the slot as temporary. Once in the > > next or some consecutive sync cycle, we reach the condition > > remote_restart_lsn >= local_restart_lsn, we will proceed to advance > > the slot and we should have the same behavior as B1. > > C. For 3, we would sync the slot, but the behavior should be the same as B. > > > > Thoughts? > > Looks reasonable to me. > > Here is the patch based on above lines. > I am also testing and verifying the patch locally. Attach a new version patch which fixed an un-initialized variable issue and added some comments. Also, temporarily enable DEBUG2 for the 040 tap-test so that we can analyze the possible CFbot failures easily. Best Regards, Hou zj v2-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch Description: v2-0001-advance-the-restart_lsn-of-synced-slots-using-log.patch
RE: Synchronizing slots from primary to standby
On Thursday, March 28, 2024 7:32 PM Amit Kapila wrote: > > On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) > wrote: > > > > When analyzing one BF error[1], we find an issue of slotsync: Since we > > don't perform logical decoding for the synced slots when syncing the > > lsn/xmin of slot, no logical snapshots will be serialized to disk. So, > > when user starts to use these synced slots after promotion, it needs > > to re-build the consistent snapshot from the restart_lsn if the > > WAL(xl_running_xacts) at restart_lsn position indicates that there are > > running transactions. This however could cause the data that before the > consistent point to be missed[2]. > > > > This issue doesn't exist on the primary because the snapshot at > > restart_lsn should have been serialized to disk > > (SnapBuildProcessRunningXacts -> SnapBuildSerialize), so even if the > > logical decoding restarts, it can find consistent snapshot immediately at > restart_lsn. > > > > To fix this, we could use the fast forward logical decoding to advance > > the synced slot's lsn/xmin when syncing these values instead of > > directly updating the slot's info. This way, the snapshot will be > > serialized to > disk when decoding. > > If we could not reach to the consistent point at the remote > > restart_lsn, the slot is marked as temp and will be persisted once it > > reaches the consistent point. I am still analyzing the fix and will share > > once > ready. > > > > Yes, we can use this but one thing to note is that > CreateDecodingContext() will expect that the slot's and current database are > the same. I think the reason for that is we need to check system tables of the > current database while decoding and sending data to the output_plugin which > won't be a requirement for the fast_forward case. So, we need to skip that > check in fast_forward mode. Agreed. > > Next, I was thinking about the case of the first time updating the restart and > confirmed_flush LSN while syncing the slots. I think we can keep the current > logic as it is based on the following analysis. > > For each logical slot, cases possible on the primary: > 1. The restart_lsn doesn't have a serialized snapshot and hasn't yet reached > the > consistent point. > 2. The restart_lsn doesn't have a serialized snapshot but has reached a > consistent point. > 3. The restart_lsn has a serialized snapshot which means it has reached a > consistent point as well. > > Considering we keep the logic to reserve initial WAL positions the same as the > current (Reserve WAL for the currently active local slot using the specified > WAL > location (restart_lsn). If the given WAL location has been removed, reserve > WAL using the oldest existing WAL segment.), I could think of the below > scenarios: > A. For 1, we shouldn't sync the slot as it still wouldn't have been marked > persistent on the primary. > B. For 2, we would sync the slot >B1. If remote_restart_lsn >= local_resart_lsn, then advance the slot by > calling > pg_logical_replication_slot_advance(). >B11. If we reach consistent point, then it should be okay because after > promotion as well we should reach consistent point. > B111. But again is it possible that there is some xact that comes > before consistent_point on primary and the same is after consistent_point on > standby? This shouldn't matter as we will start decoding transactions after > confirmed_flush_lsn which would be the same on primary and standby. >B22. If we haven't reached consistent_point, then we won't mark the > slot > as persistent, and at the next sync we will do the same till it reaches > consistent_point. At that time, the situation will be similar to B11. >B2. If remote_restart_lsn < local_restart_lsn, then we will wait for the > next > sync cycle and keep the slot as temporary. Once in the next or some > consecutive sync cycle, we reach the condition remote_restart_lsn >= > local_restart_lsn, we will proceed to advance the slot and we should have the > same behavior as B1. > C. For 3, we would sync the slot, but the behavior should be the same as B. > > Thoughts? Looks reasonable to me. Here is the patch based on above lines. I am also testing and verifying the patch locally. Best Regards, Hou zj 0001-advance-the-restart_lsn-of-synced-slots-using-logica.patch Description: 0001-advance-the-restart_lsn-of-synced-slots-using-logica.patch
Re: Synchronizing slots from primary to standby
Hi, On Thu, Mar 28, 2024 at 05:05:35PM +0530, Amit Kapila wrote: > On Thu, Mar 28, 2024 at 3:34 PM Bertrand Drouvot > wrote: > > > > On Thu, Mar 28, 2024 at 04:38:19AM +, Zhijie Hou (Fujitsu) wrote: > > > > > To fix this, we could use the fast forward logical decoding to advance > > > the synced > > > slot's lsn/xmin when syncing these values instead of directly updating the > > > slot's info. This way, the snapshot will be serialized to disk when > > > decoding. > > > If we could not reach to the consistent point at the remote restart_lsn, > > > the > > > slot is marked as temp and will be persisted once it reaches the > > > consistent > > > point. I am still analyzing the fix and will share once ready. > > > > Thanks! I'm wondering about the performance impact (even in fast_forward > > mode), > > might be worth to keep an eye on it. > > > > True, we can consider performance but correctness should be a > priority, Yeah of course. > and can we think of a better way to fix this issue? I'll keep you posted if there is one that I can think of. > > Should we create a 17 open item [1]? > > > > [1]: https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items > > > > Yes, we can do that. done. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Synchronizing slots from primary to standby
On Thu, Mar 28, 2024 at 3:34 PM Bertrand Drouvot wrote: > > On Thu, Mar 28, 2024 at 04:38:19AM +, Zhijie Hou (Fujitsu) wrote: > > > To fix this, we could use the fast forward logical decoding to advance the > > synced > > slot's lsn/xmin when syncing these values instead of directly updating the > > slot's info. This way, the snapshot will be serialized to disk when > > decoding. > > If we could not reach to the consistent point at the remote restart_lsn, the > > slot is marked as temp and will be persisted once it reaches the consistent > > point. I am still analyzing the fix and will share once ready. > > Thanks! I'm wondering about the performance impact (even in fast_forward > mode), > might be worth to keep an eye on it. > True, we can consider performance but correctness should be a priority, and can we think of a better way to fix this issue? > Should we create a 17 open item [1]? > > [1]: https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items > Yes, we can do that. -- With Regards, Amit Kapila.
Re: Synchronizing slots from primary to standby
On Thu, Mar 28, 2024 at 10:08 AM Zhijie Hou (Fujitsu) wrote: > > When analyzing one BF error[1], we find an issue of slotsync: Since we don't > perform logical decoding for the synced slots when syncing the lsn/xmin of > slot, no logical snapshots will be serialized to disk. So, when user starts to > use these synced slots after promotion, it needs to re-build the consistent > snapshot from the restart_lsn if the WAL(xl_running_xacts) at restart_lsn > position indicates that there are running transactions. This however could > cause the data that before the consistent point to be missed[2]. > > This issue doesn't exist on the primary because the snapshot at restart_lsn > should have been serialized to disk (SnapBuildProcessRunningXacts -> > SnapBuildSerialize), so even if the logical decoding restarts, it can find > consistent snapshot immediately at restart_lsn. > > To fix this, we could use the fast forward logical decoding to advance the > synced > slot's lsn/xmin when syncing these values instead of directly updating the > slot's info. This way, the snapshot will be serialized to disk when decoding. > If we could not reach to the consistent point at the remote restart_lsn, the > slot is marked as temp and will be persisted once it reaches the consistent > point. I am still analyzing the fix and will share once ready. > Yes, we can use this but one thing to note is that CreateDecodingContext() will expect that the slot's and current database are the same. I think the reason for that is we need to check system tables of the current database while decoding and sending data to the output_plugin which won't be a requirement for the fast_forward case. So, we need to skip that check in fast_forward mode. Next, I was thinking about the case of the first time updating the restart and confirmed_flush LSN while syncing the slots. I think we can keep the current logic as it is based on the following analysis. For each logical slot, cases possible on the primary: 1. The restart_lsn doesn't have a serialized snapshot and hasn't yet reached the consistent point. 2. The restart_lsn doesn't have a serialized snapshot but has reached a consistent point. 3. The restart_lsn has a serialized snapshot which means it has reached a consistent point as well. Considering we keep the logic to reserve initial WAL positions the same as the current (Reserve WAL for the currently active local slot using the specified WAL location (restart_lsn). If the given WAL location has been removed, reserve WAL using the oldest existing WAL segment.), I could think of the below scenarios: A. For 1, we shouldn't sync the slot as it still wouldn't have been marked persistent on the primary. B. For 2, we would sync the slot B1. If remote_restart_lsn >= local_resart_lsn, then advance the slot by calling pg_logical_replication_slot_advance(). B11. If we reach consistent point, then it should be okay because after promotion as well we should reach consistent point. B111. But again is it possible that there is some xact that comes before consistent_point on primary and the same is after consistent_point on standby? This shouldn't matter as we will start decoding transactions after confirmed_flush_lsn which would be the same on primary and standby. B22. If we haven't reached consistent_point, then we won't mark the slot as persistent, and at the next sync we will do the same till it reaches consistent_point. At that time, the situation will be similar to B11. B2. If remote_restart_lsn < local_restart_lsn, then we will wait for the next sync cycle and keep the slot as temporary. Once in the next or some consecutive sync cycle, we reach the condition remote_restart_lsn >= local_restart_lsn, we will proceed to advance the slot and we should have the same behavior as B1. C. For 3, we would sync the slot, but the behavior should be the same as B. Thoughts? -- With Regards, Amit Kapila.
Re: Synchronizing slots from primary to standby
Hi, On Thu, Mar 28, 2024 at 04:38:19AM +, Zhijie Hou (Fujitsu) wrote: > Hi, > > When analyzing one BF error[1], we find an issue of slotsync: Since we don't > perform logical decoding for the synced slots when syncing the lsn/xmin of > slot, no logical snapshots will be serialized to disk. So, when user starts to > use these synced slots after promotion, it needs to re-build the consistent > snapshot from the restart_lsn if the WAL(xl_running_xacts) at restart_lsn > position indicates that there are running transactions. This however could > cause the data that before the consistent point to be missed[2]. I see, nice catch and explanation, thanks! > This issue doesn't exist on the primary because the snapshot at restart_lsn > should have been serialized to disk (SnapBuildProcessRunningXacts -> > SnapBuildSerialize), so even if the logical decoding restarts, it can find > consistent snapshot immediately at restart_lsn. Right. > To fix this, we could use the fast forward logical decoding to advance the > synced > slot's lsn/xmin when syncing these values instead of directly updating the > slot's info. This way, the snapshot will be serialized to disk when decoding. > If we could not reach to the consistent point at the remote restart_lsn, the > slot is marked as temp and will be persisted once it reaches the consistent > point. I am still analyzing the fix and will share once ready. Thanks! I'm wondering about the performance impact (even in fast_forward mode), might be worth to keep an eye on it. Should we create a 17 open item [1]? [1]: https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
RE: Synchronizing slots from primary to standby
Hi, When analyzing one BF error[1], we find an issue of slotsync: Since we don't perform logical decoding for the synced slots when syncing the lsn/xmin of slot, no logical snapshots will be serialized to disk. So, when user starts to use these synced slots after promotion, it needs to re-build the consistent snapshot from the restart_lsn if the WAL(xl_running_xacts) at restart_lsn position indicates that there are running transactions. This however could cause the data that before the consistent point to be missed[2]. This issue doesn't exist on the primary because the snapshot at restart_lsn should have been serialized to disk (SnapBuildProcessRunningXacts -> SnapBuildSerialize), so even if the logical decoding restarts, it can find consistent snapshot immediately at restart_lsn. To fix this, we could use the fast forward logical decoding to advance the synced slot's lsn/xmin when syncing these values instead of directly updating the slot's info. This way, the snapshot will be serialized to disk when decoding. If we could not reach to the consistent point at the remote restart_lsn, the slot is marked as temp and will be persisted once it reaches the consistent point. I am still analyzing the fix and will share once ready. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2024-03-19%2010%3A03%3A06 [2] The steps to reproduce the data miss issue on a primary->standby setup: Note, we need to set LOG_SNAPSHOT_INTERVAL_MS to a bigger number(150) to prevent cocurrent LogStandbySnapshot() call and enable sync_replication_slots on the standby. 1. Create a failover logical slot on the primary. SELECT 'init' FROM pg_create_logical_replication_slot('logicalslot', 'test_decoding', false, false, true); 2. Use the following steps to advance the restart_lsn of the failover slot to a position where the xl_running_xacts at that position indicates that there is running transaction. TXN1 BEGIN; create table dummy1(a int); TXN2 SELECT pg_log_standby_snapshot(); TXN1 COMMIT; TXN1 BEGIN; create table dummy2(a int); TXN2 SELECT pg_log_standby_snapshot(); TXN1 COMMIT; -- the restart_lsn will be advanced to a position where there was 1 running transaction. And we need to wait for the restart_lsn to be synced to the standby. SELECT pg_replication_slot_advance('logicalslot', pg_current_wal_lsn()); -- insert some data here before calling next pg_log_standby_snapshot(). INSERT INTO reptable VALUES(999); 3. Promote the standby and try to consume the change(999) from the synced slot on the standby. We will find that no change is returned. select * from pg_logical_slot_get_changes('logicalslot', NULL, NULL); Best Regards, Hou zj
Re: Synchronizing slots from primary to standby
Hi, On Thu, Mar 14, 2024 at 02:22:44AM +, Zhijie Hou (Fujitsu) wrote: > Hi, > > Since the standby_slot_names patch has been committed, I am attaching the last > doc patch for review. > Thanks! 1 === + continue subscribing to publications now on the new primary server without + any data loss. I think "without any data loss" should be re-worded in this context. Data loss in the sense "data committed on the primary and not visible on the subscriber in case of failover" can still occurs (in case synchronous replication is not used). 2 === + If the result (failover_ready) of both above steps is + true, existing subscriptions will be able to continue without data loss. + I don't think that's true if synchronous replication is not used. Say, - synchronous replication is not used - primary is not able to reach the standby anymore and standby_slot_names is set - new data is inserted into the primary - then not replicated to subscriber (due to standby_slot_names) Then I think the both above steps will return true but data would be lost in case of failover. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
RE: Synchronizing slots from primary to standby
Hi, Since the standby_slot_names patch has been committed, I am attaching the last doc patch for review. Best Regards, Hou zj v109-0001-Document-the-steps-to-check-if-the-standby-is-r.patch Description: v109-0001-Document-the-steps-to-check-if-the-standby-is-r.patch
Re: Synchronizing slots from primary to standby
On Fri, Mar 8, 2024 at 9:56 AM Ajin Cherian wrote: > >> Pushed with minor modifications. I'll keep an eye on BF. >> >> BTW, one thing that we should try to evaluate a bit more is the >> traversal of slots in StandbySlotsHaveCaughtup() where we verify if >> all the slots mentioned in standby_slot_names have received the >> required WAL. Even if the standby_slot_names list is short the total >> number of slots can be much larger which can lead to an increase in >> CPU usage during traversal. There is an optimization that allows to >> cache ss_oldest_flush_lsn and ensures that we don't need to traverse >> the slots each time so it may not hit frequently but still there is a >> chance. I see it is possible to further optimize this area by caching >> the position of each slot mentioned in standby_slot_names in >> replication_slots array but not sure whether it is worth. >> >> > > I tried to test this by configuring a large number of logical slots while > making sure the standby slots are at the end of the array and checking if > there was any performance hit in logical replication from these searches. > Thanks Ajin and Nisha. We also plan: 1) Redoing XLogSendLogical time-log related test with 'sync_replication_slots' enabled. 2) pg_recvlogical test to monitor lag in StandbySlotsHaveCaughtup() for a large number of slots. 3) Profiling to see if StandbySlotsHaveCaughtup() is noticeable in the report when there are a large number of slots to traverse. thanks Shveta
Re: Synchronizing slots from primary to standby
On Fri, Mar 8, 2024 at 2:33 PM Amit Kapila wrote: > On Thu, Mar 7, 2024 at 12:00 PM Zhijie Hou (Fujitsu) > wrote: > > > > > > Attach the V108 patch set which addressed above and Peter's comments. > > I also removed the check for "*" in guc check hook. > > > > > Pushed with minor modifications. I'll keep an eye on BF. > > BTW, one thing that we should try to evaluate a bit more is the > traversal of slots in StandbySlotsHaveCaughtup() where we verify if > all the slots mentioned in standby_slot_names have received the > required WAL. Even if the standby_slot_names list is short the total > number of slots can be much larger which can lead to an increase in > CPU usage during traversal. There is an optimization that allows to > cache ss_oldest_flush_lsn and ensures that we don't need to traverse > the slots each time so it may not hit frequently but still there is a > chance. I see it is possible to further optimize this area by caching > the position of each slot mentioned in standby_slot_names in > replication_slots array but not sure whether it is worth. > > > I tried to test this by configuring a large number of logical slots while making sure the standby slots are at the end of the array and checking if there was any performance hit in logical replication from these searches. Setup: 1. 1 primary server configured with 3 servers in the standby_slot_names, 1 extra logical slot (not configured for failover) + 1 logical subscriber configures as failover + 3 physical standbys(all configured to sync logical slots) 2. 1 primary server configured with 3 servers in the standby_slot_names, 100 extra logical slot (not configured for failover) + 1 logical subscriber configures as failover + 3 physical standbys(all configured to sync logical slots) 3. 1 primary server configured with 3 servers in the standby_slot_names, 500 extra logical slot (not configured for failover) + 1 logical subscriber configures as failover + 3 physical standbys(all configured to sync logical slots) In the three setups, 3 standby_slot_names are compared with a list of 2,101 and 501 slots respectively. I ran a pgbench for 15 minutes for all 3 setups: Case 1: Average TPS - 8.143399 TPS Case 2: Average TPS - 8.187462 TPS Case 3: Average TPS - 8.190611 TPS I see no degradation in the performance, the differences in performance are well within the run to run variations seen. Nisha also did some performance tests to record the lag introduced by the large number of slots traversal in StandbySlotsHaveCaughtup(). The tests logged time at the start and end of the XLogSendLogical() call (which eventually calls WalSndWaitForWal() --> StandbySlotsHaveCaughtup()) and calculated total time taken by this function during the load run for different total slots count. Setup: --one primary with 3 standbys and one subscriber with one active subscription --hot_standby_feedback=off and sync_replication_slots=false --made sure the standby slots remain at the end ReplicationSlotCtl->replication_slots array to measure performance of worst case scenario for standby slot search in StandbySlotsHaveCaughtup() pgbench for 15 min was run. Here is the data: Case1 : with 1 logical slot, standby_slot_names having 3 slots Run1: 626.141642 secs Run2: 631.930254 secs Case2 : with 100 logical slots, standby_slot_names having 3 slots Run1: 629.38332 secs Run2: 630.548432 secs Case3 : with 500 logical slots, standby_slot_names having 3 slots Run1: 629.910829 secs Run2: 627.924183 secs There was no degradation in performance seen. Thanks Nisha for helping with the testing. regards, Ajin Cherian Fujitsu Australia
Re: Synchronizing slots from primary to standby
On Thu, Mar 7, 2024 at 12:00 PM Zhijie Hou (Fujitsu) wrote: > > > Attach the V108 patch set which addressed above and Peter's comments. > I also removed the check for "*" in guc check hook. > Pushed with minor modifications. I'll keep an eye on BF. BTW, one thing that we should try to evaluate a bit more is the traversal of slots in StandbySlotsHaveCaughtup() where we verify if all the slots mentioned in standby_slot_names have received the required WAL. Even if the standby_slot_names list is short the total number of slots can be much larger which can lead to an increase in CPU usage during traversal. There is an optimization that allows to cache ss_oldest_flush_lsn and ensures that we don't need to traverse the slots each time so it may not hit frequently but still there is a chance. I see it is possible to further optimize this area by caching the position of each slot mentioned in standby_slot_names in replication_slots array but not sure whether it is worth. -- With Regards, Amit Kapila.
RE: Synchronizing slots from primary to standby
On Thursday, March 7, 2024 12:46 PM Amit Kapila wrote: > > On Thu, Mar 7, 2024 at 7:35 AM Peter Smith > wrote: > > > > Here are some review comments for v107-0001 > > > > == > > src/backend/replication/slot.c > > > > 1. > > +/* > > + * Struct for the configuration of standby_slot_names. > > + * > > + * Note: this must be a flat representation that can be held in a single > > chunk > > + * of guc_malloc'd memory, so that it can be stored as the "extra" data for > the > > + * standby_slot_names GUC. > > + */ > > +typedef struct > > +{ > > + int slot_num; > > + > > + /* slot_names contains nmembers consecutive nul-terminated C strings */ > > + char slot_names[FLEXIBLE_ARRAY_MEMBER]; > > +} StandbySlotConfigData; > > + > > > > 1a. > > To avoid any ambiguity this 1st field is somehow a slot ID number, I > > felt a better name would be 'nslotnames' or even just 'n' or 'count', > > > > We can probably just add a comment above slot_num and that should be > sufficient but I am fine with 'nslotnames' as well, in anycase let's > add a comment for the same. Added. > > > > > 6b. > > IMO this function would be tidier written such that the > > MyReplicationSlot->data.name is passed as a parameter. Then you can > > name the function more naturally like: > > > > IsSlotInStandbySlotNames(const char *slot_name) > > > > +1. How about naming it as SlotExistsinStandbySlotNames(char > *slot_name) and pass the slot_name from MyReplicationSlot? Otherwise, > we need an Assert for MyReplicationSlot in this function. Changed as suggested. > > Also, can we add a comment like below before the loop: > + /* > + * XXX: We are not expecting this list to be long so a linear search > + * shouldn't hurt but if that turns out not to be true then we can cache > + * this information for each WalSender as well. > + */ Added. Attach the V108 patch set which addressed above and Peter's comments. I also removed the check for "*" in guc check hook. Best Regards, Hou zj v108-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch Description: v108-0001-Allow-logical-walsenders-to-wait-for-the-physic.patch
RE: Synchronizing slots from primary to standby
On Thursday, March 7, 2024 10:05 AM Peter Smith wrote: > > Here are some review comments for v107-0001 Thanks for the comments. > > == > src/backend/replication/slot.c > > 1. > +/* > + * Struct for the configuration of standby_slot_names. > + * > + * Note: this must be a flat representation that can be held in a > +single chunk > + * of guc_malloc'd memory, so that it can be stored as the "extra" data > +for the > + * standby_slot_names GUC. > + */ > +typedef struct > +{ > + int slot_num; > + > + /* slot_names contains nmembers consecutive nul-terminated C strings > +*/ char slot_names[FLEXIBLE_ARRAY_MEMBER]; > +} StandbySlotConfigData; > + > > 1a. > To avoid any ambiguity this 1st field is somehow a slot ID number, I felt a > better > name would be 'nslotnames' or even just 'n' or 'count', Changed to 'nslotnames'. > > ~ > > 1b. > (fix typo) > > SUGGESTION for the 2nd field comment > slot_names is a chunk of 'n' X consecutive null-terminated C strings Changed. > > ~ > > 1c. > A more explanatory name for this typedef maybe is > 'StandbySlotNamesConfigData' ? Changed. > > ~~~ > > > 2. > +/* This is parsed and cached configuration for standby_slot_names */ > +static StandbySlotConfigData *standby_slot_config; > > > 2a. > /This is parsed and cached configuration for .../This is the parsed and cached > configuration for .../ Changed. > > ~ > > 2b. > Similar to above -- since this only has name information maybe it is more > correct to call it 'standby_slot_names_config'? > Changed. > ~~~ > > 3. > +/* > + * A helper function to validate slots specified in GUC standby_slot_names. > + * > + * The rawname will be parsed, and the parsed result will be saved into > + * *elemlist. > + */ > +static bool > +validate_standby_slots(char *rawname, List **elemlist) > > /and the parsed result/and the result/ > Changed. > ~~~ > > 4. check_standby_slot_names > > + /* Need a modifiable copy of string */ rawname = pstrdup(*newval); > > /copy of string/copy of the GUC string/ > Changed. > ~~~ > > 5. > +assign_standby_slot_names(const char *newval, void *extra) { > + /* > + * The standby slots may have changed, so we must recompute the oldest > + * LSN. > + */ > + ss_oldest_flush_lsn = InvalidXLogRecPtr; > + > + standby_slot_config = (StandbySlotConfigData *) extra; } > > To avoid leaking don't we need to somewhere take care to free any memory > used by a previous value (if any) of this 'standby_slot_config'? > The memory of extra is maintained by the GUC mechanism. It will be automatically freed when the associated GUC setting is no longer of interest. See src/backend/utils/misc/README for details. > ~~~ > > 6. AcquiredStandbySlot > > +/* > + * Return true if the currently acquired slot is specified in > + * standby_slot_names GUC; otherwise, return false. > + */ > +bool > +AcquiredStandbySlot(void) > +{ > + const char *name; > + > + /* Return false if there is no value in standby_slot_names */ if > + (standby_slot_config == NULL) return false; > + > + name = standby_slot_config->slot_names; for (int i = 0; i < > + standby_slot_config->slot_num; i++) { if (strcmp(name, > + NameStr(MyReplicationSlot->data.name)) == 0) return true; > + > + name += strlen(name) + 1; > + } > + > + return false; > +} > > 6a. > Just checking "(standby_slot_config == NULL)" doesn't seem enough to me, > because IIUIC it is possible when 'standby_slot_names' has no value then > maybe standby_slot_config is not NULL but standby_slot_config->slot_num is > 0. The standby_slot_config will always be NULL if there is no value in it. While checking, I did find a rare case that if there are only some white space in the standby_slot_names, then slot_num will be 0, and have fixed it so that standby_slot_config will always be NULL if there is no meaning value in guc. > > ~ > > 6b. > IMO this function would be tidier written such that the > MyReplicationSlot->data.name is passed as a parameter. Then you can > name the function more naturally like: > > IsSlotInStandbySlotNames(const char *slot_name) Changed it to SlotExistsInStandbySlotNames. > > ~ > > 6c. > IMO the body of the function will be tidier if written so there are only 2 > returns > instead of 3 like > > SUGGESTION: > if (...) > { > for (...) > { > ... > return true; > } > } > return false; I personally prefer the current style. > > ~~~ > > 7. > + /* > + * Don't need to wait for the standbys to catch up if there is no value > + in > + * standby_slot_names. > + */ > + if (standby_slot_config == NULL) > + return true; > > (similar to a previous review comment) > > This check doesn't seem enough because IIUIC it is possible when > 'standby_slot_names' has no value then maybe standby_slot_config is not NULL > but standby_slot_config->slot_num is 0. Same as above. > > ~~~ > > 8. WaitForStandbyConfirmation > > + /* > + * Don't need to wait for the standby to catch up if the current > + acquired > + * slot is not a logica
Re: Synchronizing slots from primary to standby
On Thu, Mar 7, 2024 at 8:37 AM shveta malik wrote: > I thought about whether we can make standby_slot_names as USERSET instead of SIGHUP and it doesn't sound like a good idea as that can lead to inconsistent standby replicas even after configuring the correct value of standby_slot_names. One can set a different or '' (empty) value for a particular session and consume all changes from the slot without waiting for the standby to acknowledge the change. Also, it would be difficult for users to ensure whether the standby is always ahead of subscribers. Does anyone think differently? -- With Regards, Amit Kapila.