Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Sat, Jan 23, 2021 at 1:44 PM Bharath Rupireddy wrote: > > On Sat, Jan 23, 2021 at 11:50 AM Amit Kapila wrote: > > > > On Fri, Jan 22, 2021 at 2:59 PM Bharath Rupireddy > > wrote: > > > > > > On Fri, Jan 22, 2021 at 2:00 PM Amit Kapila > > > wrote: > > > > > > Yes you are right. Looks like the above commit is causing the issue. I > > > reverted that commit and did not see the drop publication bug, see use > > > case [1]. > > > > > > > Okay, thanks for confirming the same. I am planning to push the > > attached where I made changes in few comments, combined the code and > > test patch, and made a few changes in the commit message. Let me know > > if you have any suggestions? > > Thanks Amit. Looks like there were some typos in the test case > description, I corrected them. > Thanks, I have pushed the patch. -- With Regards, Amit Kapila.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Sat, Jan 23, 2021 at 11:50 AM Amit Kapila wrote: > > On Fri, Jan 22, 2021 at 2:59 PM Bharath Rupireddy > wrote: > > > > On Fri, Jan 22, 2021 at 2:00 PM Amit Kapila wrote: > > > > Yes you are right. Looks like the above commit is causing the issue. I > > reverted that commit and did not see the drop publication bug, see use > > case [1]. > > > > Okay, thanks for confirming the same. I am planning to push the > attached where I made changes in few comments, combined the code and > test patch, and made a few changes in the commit message. Let me know > if you have any suggestions? Thanks Amit. Looks like there were some typos in the test case description, I corrected them. +# Drop the table from publicaiton +# Add the table to publicaiton again +# Rerfresh publication after table is added to publication Attaching v7 patch. I'm fine with it. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com v7-0001-Fix-ALTER-PUBLICATION.DROP-TABLE-behavior.patch Description: Binary data
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Fri, Jan 22, 2021 at 2:59 PM Bharath Rupireddy wrote: > > On Fri, Jan 22, 2021 at 2:00 PM Amit Kapila wrote: > > Yes you are right. Looks like the above commit is causing the issue. I > reverted that commit and did not see the drop publication bug, see use > case [1]. > Okay, thanks for confirming the same. I am planning to push the attached where I made changes in few comments, combined the code and test patch, and made a few changes in the commit message. Let me know if you have any suggestions? -- With Regards, Amit Kapila. v6-0001-Fix-ALTER-PUBLICATION.DROP-TABLE-behavior.patch Description: Binary data
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Fri, Jan 22, 2021 at 2:59 PM Bharath Rupireddy wrote: > > BTW, have we tried to check if this problem exists in back-branches? > > It seems to me the problem has been recently introduced by commit > > 69bd60672a. I am telling this by looking at code and have yet not > > performed any testing so I could be wrong. > > Yes you are right. Looks like the above commit is causing the issue. I > reverted that commit and did not see the drop publication bug, see use > case [1]. I think, to be precise the issue is because of not setting pubactions to false when if (!entry->replicate_valid) is true by the commit 69bd60672. In fact, it removed the correct check. Since found(denotes whether the cache entry exists or not) will be true even after the alter publication ... drop table, the cache entry->replicate_valid is the right check which will be set to false in rel_sync_cache_publication_cb. But this commit removed it. And also the reason for not catching this bug during the testing of 69bd60672 commit's patch is that we don't have a test case for alter publication ... drop table behaviour. - if (!found || !entry->replicate_valid) + if (!found) + { + /* immediately make a new entry valid enough to satisfy callbacks */ + entry->schema_sent = false; + entry->streamed_txns = NIL; + entry->replicate_valid = false; + entry->pubactions.pubinsert = entry->pubactions.pubupdate = + entry->pubactions.pubdelete = entry->pubactions.pubtruncate = false; + entry->publish_as_relid = InvalidOid; + } + + /* Validate the entry */ + if (!entry->replicate_valid) { List *pubids = GetRelationPublications(relid); ListCell *lc; @@ -977,9 +987,6 @@ get_rel_sync_entry(PGOutputData *data, Oid relid) * relcache considers all publications given relation is in, but here * we only need to consider ones that the subscriber requested. */ - entry->pubactions.pubinsert = entry->pubactions.pubupdate = - entry->pubactions.pubdelete = entry->pubactions.pubtruncate = false; - With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Fri, Jan 22, 2021 at 2:00 PM Amit Kapila wrote: > Thanks for the patch. Few comments: > + > +# Test replication with multiple publications for subscription > + > > While checking the existing tests in 001_rep_changes.pl, I came across > the below test which has multiple publications, won't that suffice the > need? > > "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' > PUBLICATION tap_pub, tap_pub_ins_only" Yeah I saw that test case earlier, but we couldn't catch the error with it. Our earlier fix was to set pubactions to false whenever publish is false in get_rel_sync_entry foreach(lc, data->publications) { //publish is set to false when the given relation's publication is not in the list of publications provided by subscription i.e. data->publications if(!publish) { //set all entry->pubactions to false } } The existing use case: The publication tap_pub has tables tab_rep, tab_full, tab_full2, tab_mixed, tab_include, tab_nothing, tab_full_pk The publication tap_pub_ins_only has table tab_ins CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub, tap_pub_ins_only; INSERT INTO tab_ins SELECT generate_series(1,50); And the first insertion after the subscription is created happens into the table tab_ins which is with tap_pub_ins_only i.e. 2nd publication in the list of publications specified in the create subscription statement. The existing use case analysis: given relation tab_ins's publication is tap_pub_ins_only data->publications list contains - tap_pub, tap_pub_ins_only loop 1: data->publications list 1st element is tap_pub and relation tab_ins's publication is tap_pub_ins_only, so publish is false, pubations are set to false. loop 2: data->publications list 2nd element is tap_pub_ins_only and relation tab_ins's publication is also tap_pub_ins_only, publish is true, pubations are set to true, loop is exited with entry->pubactions set to true, hence the replication happens for the table. japin's use case [1]: But, in the japin use case[1] where we saw regression with the initial version of the fix, the first insertion happens into the table t1 which is with publication mypub1 i.e. 1st in the list of publications associated with the subscription. japin's use case [1] analysis: given relation t1's publication is mypub1 data->publications list contains - mypub1, mypub2 loop 1: data->publications list 1st element is mypub1 and relation t1's publication is also mypub1, so publish is true, pubations.insert is set to true, other actions are false because the publication mypub1 is created WITH (publish='insert');, so the loop can't get exited from below break condition. if (entry->pubactions.pubinsert && entry->pubactions.pubupdate && entry->pubactions.pubdelete && entry->pubactions.pubtruncate) break; loop 2: data->publications list 2nd element is mypub2 and relation t1's publication is mypub1, so publish is false, all pubations are set to false. loop is exited with all entry->pubactions set to false, hence the replication doesn't happen for the table t1. >From the above analysis, it turns out that the existing use case didn't catch the regression, that is why we had to add a new test case. [1] Step (1) -- On publisher CREATE TABLE t1 (a int); CREATE TABLE t2 (a int); CREATE PUBLICATION mypub1 FOR TABLE t1 WITH (publish='insert'); CREATE PUBLICATION mypub2 FOR TABLE t2; Step (2) -- On subscriber CREATE TABLE t1 (a int); CREATE TABLE t2 (a int); CREATE SUBSCRIPTION mysub1 CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION mypub1, mypub2; Step (3) -- On publisher INSERT INTO t1 VALUES (1); Step (4) -- On subscriber SELECT * FROM t1; > BTW, have we tried to check if this problem exists in back-branches? > It seems to me the problem has been recently introduced by commit > 69bd60672a. I am telling this by looking at code and have yet not > performed any testing so I could be wrong. Yes you are right. Looks like the above commit is causing the issue. I reverted that commit and did not see the drop publication bug, see use case [1]. [1] -- On publisher CREATE TABLE t1 (a int); INSERT INTO t1 VALUES (1); CREATE PUBLICATION mypub1 FOR TABLE t1; -- On subscriber CREATE TABLE t1 (a int); CREATE SUBSCRIPTION mysub1 CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION mypub1; -- On publisher INSERT INTO t1 VALUES (2); -- On subscriber SELECT * FROM t1; -- On publisher ALTER PUBLICATION mypub1 DROP TABLE t1; INSERT INTO t1 VALUES (3); -- On subscriber SELECT * FROM t1; --> we should not see value 3, if the alter publication ... drop table worked correctly. -- On publisher INSERT INTO t1 SELECT * FROM generate_series(4, 88); -- On subscriber SELECT * FROM t1; With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Fri, Jan 22, 2021 at 2:00 PM Amit Kapila wrote: > > On Sat, Jan 16, 2021 at 6:08 PM Bharath Rupireddy > wrote: > > > > On Sat, Jan 16, 2021 at 12:21 PM Bharath Rupireddy > > wrote: > > > > In the test, can we have multiple publications for the subscription > > > > because that is how we discovered that the originally proposed patch > > > > was not correct? Also, is it possible to extend some of the existing > > > > tests in 001_rep_changes.pl or anywhere so that we can avoid some cost > > > > of the replication setup. > > > > > > Sure, I will add the multiple publications use case provided by japin > > > and also see if I can move the tests from 100_bugs.pl to > > > 0001_rep_changes.pl. > > > > Attaching v5 patch set. 0001 - No change. 0002 - I moved the tests to > > 001_rep_changes.pl so that we could avoid creation of subscriber, > > publisher nodes and other set up. > > > > Thanks for the patch. Few comments: > > + > +# Test replication with multiple publications for subscription > + > > While checking the existing tests in 001_rep_changes.pl, I came across > the below test which has multiple publications, won't that suffice the > need? > > "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' > PUBLICATION tap_pub, tap_pub_ins_only" > > BTW, have we tried to check if this problem exists in back-branches? > It seems to me the problem has been recently introduced by commit > 69bd60672a. I am telling this by looking at code and have yet not > performed any testing so I could be wrong. Thanks for the comments Amit. I will update soon. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Sat, Jan 16, 2021 at 6:08 PM Bharath Rupireddy wrote: > > On Sat, Jan 16, 2021 at 12:21 PM Bharath Rupireddy > wrote: > > > In the test, can we have multiple publications for the subscription > > > because that is how we discovered that the originally proposed patch > > > was not correct? Also, is it possible to extend some of the existing > > > tests in 001_rep_changes.pl or anywhere so that we can avoid some cost > > > of the replication setup. > > > > Sure, I will add the multiple publications use case provided by japin > > and also see if I can move the tests from 100_bugs.pl to > > 0001_rep_changes.pl. > > Attaching v5 patch set. 0001 - No change. 0002 - I moved the tests to > 001_rep_changes.pl so that we could avoid creation of subscriber, > publisher nodes and other set up. > Thanks for the patch. Few comments: + +# Test replication with multiple publications for subscription + While checking the existing tests in 001_rep_changes.pl, I came across the below test which has multiple publications, won't that suffice the need? "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub, tap_pub_ins_only" BTW, have we tried to check if this problem exists in back-branches? It seems to me the problem has been recently introduced by commit 69bd60672a. I am telling this by looking at code and have yet not performed any testing so I could be wrong. -- With Regards, Amit Kapila.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Sat, Jan 16, 2021 at 12:21 PM Bharath Rupireddy wrote: > > In the test, can we have multiple publications for the subscription > > because that is how we discovered that the originally proposed patch > > was not correct? Also, is it possible to extend some of the existing > > tests in 001_rep_changes.pl or anywhere so that we can avoid some cost > > of the replication setup. > > Sure, I will add the multiple publications use case provided by japin > and also see if I can move the tests from 100_bugs.pl to > 0001_rep_changes.pl. Attaching v5 patch set. 0001 - No change. 0002 - I moved the tests to 001_rep_changes.pl so that we could avoid creation of subscriber, publisher nodes and other set up. I also added the multiple publication for the subscription test case which was failing with v2-0001 patch. Note that I had to create subscriber, publications for this test case, because I couldn't find the regression (on v2-001 patch) with any of the existing test cases and also I'm dropping them after the test so that it will not harm any existing following tests. Hope that's okay. With v5-0002 and v2-0001, the multiple publication for the subscription test case fails. Please consider the v5 patch set for further review. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com From 7dc6f91ca7225119b4fe6b3f0869385519721d2c Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Sat, 16 Jan 2021 11:30:09 +0530 Subject: [PATCH v5 1/1] Fix ALTER PUBLICATION...DROP TABLE behaviour Currently, in logical replication, publisher/walsender publishes the tables even though they aren't part of the publication i.e they are dropped from the publication. Because of this ALTER PUBLICATION...DROP TABLE doesn't work as expected. To fix above issue, when the entry got invalidated in rel_sync_cache_publication_cb(), mark the pubactions to false and let get_rel_sync_entry() recalculate the pubactions. --- src/backend/replication/pgoutput/pgoutput.c | 13 + 1 file changed, 13 insertions(+) diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c index 2f01137b42..478cd1f9f5 100644 --- a/src/backend/replication/pgoutput/pgoutput.c +++ b/src/backend/replication/pgoutput/pgoutput.c @@ -1179,5 +1179,18 @@ rel_sync_cache_publication_cb(Datum arg, int cacheid, uint32 hashvalue) */ hash_seq_init(&status, RelationSyncCache); while ((entry = (RelationSyncEntry *) hash_seq_search(&status)) != NULL) + { entry->replicate_valid = false; + + /* + * There might be some relations dropped from the publication, we do + * not need to publish the changes for them. Since we cannot get the + * the dropped relations (see above), we reset pubactions for all + * entries. + */ + entry->pubactions.pubinsert = false; + entry->pubactions.pubupdate = false; + entry->pubactions.pubdelete = false; + entry->pubactions.pubtruncate = false; + } } -- 2.25.1 From 95f4855a1f7ecaa8e2897a154e9347945770e76c Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Sat, 16 Jan 2021 17:13:06 +0530 Subject: [PATCH v5 2/2] Test behaviour of ALTER PUBLICATION ... DROP TABLE --- src/test/subscription/t/001_rep_changes.pl | 95 +- 1 file changed, 94 insertions(+), 1 deletion(-) diff --git a/src/test/subscription/t/001_rep_changes.pl b/src/test/subscription/t/001_rep_changes.pl index 0680f44a1a..215ccf40c9 100644 --- a/src/test/subscription/t/001_rep_changes.pl +++ b/src/test/subscription/t/001_rep_changes.pl @@ -3,7 +3,7 @@ use strict; use warnings; use PostgresNode; use TestLib; -use Test::More tests => 23; +use Test::More tests => 27; # Initialize publisher node my $node_publisher = get_new_node('publisher'); @@ -153,6 +153,99 @@ is($result, qq(20|-20|-1), $node_publisher->safe_psql('postgres', "INSERT INTO tab_full SELECT generate_series(1,10)"); +# Test behaviour of ALTER PUBLICATION ... DROP TABLE + +# When publisher drops a table from publication, it should also stop sending +# its changes to subscriber. We look at the subscriber whether it receives +# the row that is inserted to the table in the publisher after it is dropped +# from the publication. + +$result = $node_subscriber->safe_psql('postgres', + "SELECT count(*), min(a), max(a) FROM tab_ins"); +is($result, qq(1052|1|1002), 'check rows on subscriber before table drop from publication'); + +# Drop the table from publicaiton +$node_publisher->safe_psql('postgres', + "ALTER PUBLICATION tap_pub_ins_only DROP TABLE tab_ins"); + +# Insert a row in publisher, but publisher will not send this row to subscriber +$node_publisher->safe_psql('postgres', "INSERT INTO tab_ins VALUES()"); + +$node_publisher->wait_for_catchup('tap_sub'); + +# Subscriber will not receive the inserted row, after table is dropped from +# publication, so row count should remain the same. +$result = $node_subscriber->safe_psql('postgres', + "SELECT count(*), min(a), max(a) FROM tab_ins"); +is($result, qq(10
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Sat, Jan 16, 2021 at 12:02 PM Amit Kapila wrote: > > On Sat, Jan 16, 2021 at 11:52 AM Bharath Rupireddy > wrote: > > > > I made an entry in the commitfest[1], so that the patches will get a > > chance to run on all the platforms. > > > > Attaching v4 patch set i.e. 0001 - fix, 0002 - test case. > > > > In the test, can we have multiple publications for the subscription > because that is how we discovered that the originally proposed patch > was not correct? Also, is it possible to extend some of the existing > tests in 001_rep_changes.pl or anywhere so that we can avoid some cost > of the replication setup. Sure, I will add the multiple publications use case provided by japin and also see if I can move the tests from 100_bugs.pl to 0001_rep_changes.pl. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Sat, 16 Jan 2021 at 14:21, Bharath Rupireddy wrote: > I made an entry in the commitfest[1], so that the patches will get a > chance to run on all the platforms. > > Attaching v4 patch set i.e. 0001 - fix, 0002 - test case. > > [1] - https://commitfest.postgresql.org/32/2944/ > Thanks for the updated patch. -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Sat, Jan 16, 2021 at 11:52 AM Bharath Rupireddy wrote: > > I made an entry in the commitfest[1], so that the patches will get a > chance to run on all the platforms. > > Attaching v4 patch set i.e. 0001 - fix, 0002 - test case. > In the test, can we have multiple publications for the subscription because that is how we discovered that the originally proposed patch was not correct? Also, is it possible to extend some of the existing tests in 001_rep_changes.pl or anywhere so that we can avoid some cost of the replication setup. -- With Regards, Amit Kapila.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
I made an entry in the commitfest[1], so that the patches will get a chance to run on all the platforms. Attaching v4 patch set i.e. 0001 - fix, 0002 - test case. [1] - https://commitfest.postgresql.org/32/2944/ With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com v4-0001-Fix-ALTER-PUBLICATION.DROP-TABLE-behaviour.patch Description: Binary data v4-0002-Test-behaviour-of-ALTER-PUBLICATION-.-DROP-TABLE.patch Description: Binary data
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Fri, Jan 15, 2021 at 4:54 PM japin wrote: > On Fri, 15 Jan 2021 at 18:57, Bharath Rupireddy > wrote: > > On Fri, Jan 15, 2021 at 4:03 PM Amit Kapila wrote: > >> That sounds like a better way to fix and in fact, I was about to > >> suggest the same after reading your previous email. I'll think more on > >> this but in the meantime, can you add the test case in the patch as > >> requested earlier as well. > > > > @Li Japin Please let me know if you have already started to work on > > the test case, otherwise I can make a 0002 patch for the test case and > > post. > > > > Yeah, I'm working on the test case. Since I'm not familair with the logical > replication test, it may take some times. Thanks a lot. I quickly added the initial use case where we saw the bug, attached is 0002 patch. Please have a look and add further use cases if necessary. If okay, it's better to make a cf entry. I have one comment in v3-0001 patch, + * There might some relations dropped from the publication, we do I think we should change it to - "There might be some relations". With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com v3-0002-Test-behaviour-of-ALTER-PUBLICATION-.-DROP-TABLE.patch Description: Binary data
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Fri, 15 Jan 2021 at 18:57, Bharath Rupireddy wrote: > On Fri, Jan 15, 2021 at 4:03 PM Amit Kapila wrote: >> That sounds like a better way to fix and in fact, I was about to >> suggest the same after reading your previous email. I'll think more on >> this but in the meantime, can you add the test case in the patch as >> requested earlier as well. > > @Li Japin Please let me know if you have already started to work on > the test case, otherwise I can make a 0002 patch for the test case and > post. > Yeah, I'm working on the test case. Since I'm not familair with the logical replication test, it may take some times. -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Fri, Jan 15, 2021 at 4:03 PM Amit Kapila wrote: > That sounds like a better way to fix and in fact, I was about to > suggest the same after reading your previous email. I'll think more on > this but in the meantime, can you add the test case in the patch as > requested earlier as well. @Li Japin Please let me know if you have already started to work on the test case, otherwise I can make a 0002 patch for the test case and post. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Fri, Jan 15, 2021 at 4:20 PM Amit Kapila wrote: > I feel it is better to not do anything for this because neither we > have a test to reproduce the problem nor is it clear from theory if > there is any problem to solve here. +1 to ignore 0002 patch. Thanks Amit. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Fri, Jan 15, 2021 at 11:48 AM Bharath Rupireddy wrote: > > On Thu, Jan 14, 2021 at 10:53 AM Amit Kapila wrote: > > > 0002 - Invalidates the relation map cache in subscriber syscache > > > invalidation callbacks. Currently, I'm setting entry->state to > > > SUBREL_STATE_UNKNOWN in the new invalidation function that's > > > introduced logicalrep_relmap_invalidate, so that in the next call to > > > logicalrep_rel_open the entry's state will be read from the system > > > catalogues using GetSubscriptionRelState. If this is sound's bit > > > strange, I can add a boolean invalid to LogicalRepRelMapEntry > > > structure and set that here and in logicalrep_rel_open, I can have > > > something like: > > > if (entry->state != SUBREL_STATE_READY || entry->invalid) > > > entry->state = GetSubscriptionRelState(MySubscription->oid, > > >entry->localreloid, > > >&entry->statelsn); > > > > > >if (entry->invalid) > > > entry->invalid = false; > > > > > > > So, the point of the second patch is that it good to have such a > > thing, otherwise, we don't see any problem if we just use patch-0001, > > right? I think if we fix the first-one, automatically, we will achieve > > what we are trying to with the second-one because ultimately > > logicalrep_relmap_update will be called due to Alter Pub... Drop table > > .. step and it will mark the entry as SUBREL_STATE_UNKNOWN. > > On some further analysis, I found that logicalrep_relmap_update is > called in subscribers for inserts, delets, updates and truncate > statements on the dropped(from publication) relations in the > publisher. > > If any alters to pg_subscription, then we invalidate the subscription > in subscription_change_cb, maybe_reread_subscription subscription will > take care of re-reading from the system catalogues, see > apply_handle_->ensure_transaction -> maybe_reread_subscription. > And we don't have any entry in LogicalRepRelMapEntry that gets > affected because of changes to pg_subscription, so we don't need to > invalidate the rel map cache entries in subscription_change_cb. > > If any alters to pg_subscription_rel, then there are two parameters in > LogicalRepRelMapEntry structure, state and statelsn that may get > affected. Changes to statelsn column is taken care of with the > invalidation callback invalidate_syncing_table_states setting > table_states_valid to true and > process_syncing_tables->process_syncing_tables_for_apply. > I think you are saying the reverse here. The function invalidate_syncing_table_states sets table_states_valid to false and the other one sets it to true. > But, if > state column is changed somehow (although I'm not quite sure how it > can change to different states 'i', 'd', 's', 'r' after the initial > table sync phase in logical replication, > These states are for the initial copy-phase after that these won't change. > but we can pretty much do > something like update pg_subscription_rel set srsubstate = 'd';), in > this case invalidate_syncing_table_states gets called, but if we don't > revalidation of relation map cache entries, they will have the old > state value. > Hmm, modifying state values as you are suggesting have much dire consequences like in this case it can let tablesync worker to restart and try to do perform initial-copy again or it can lead to missing data in subscriber. We don't recommend to change system catalogs manually due to such problems. > So what I feel is at least we need the 0002 patch but with only > invalidating the entries in invalidate_syncing_table_states not in > subscription_change_cb, although I'm not able to back it up with any > use case or bug as such. > I feel it is better to not do anything for this because neither we have a test to reproduce the problem nor is it clear from theory if there is any problem to solve here. -- With Regards, Amit Kapila.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Fri, Jan 15, 2021 at 2:46 PM japin wrote: > > > > > I'm sorry, the 0001 patch is totally wrong. If we have only one > > publication, it works, > > however, this is a special case. If we have multiple publication in one > > subscription, > > it doesn't work. Here is a usecase. > > > > Step (1) > > -- On publisher > > CREATE TABLE t1 (a int); > > CREATE TABLE t2 (a int); > > CREATE PUBLICATION mypub1 FOR TABLE t1 WITH (publish='insert'); > > CREATE PUBLICATION mypub2 FOR TABLE t2; > > > > Step (2) > > -- On subscriber > > CREATE TABLE t1 (a int); > > CREATE TABLE t2 (a int); > > CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432 > > dbname=postgres' PUBLICATION mypub1, mypub2; > > > > Step (3) > > -- On publisher > > INSERT INTO t1 VALUES (1); > > > > Step (4) > > -- On subscriber > > SELECT * FROM t1; > > > > In step (4), we cannot get the records, because there has two publications > > in > > data->publications, so we will check one by one. > > The first publication is "mypub1" and entry->pubactions.pubinsert will be > > set > > true, however, in the second round, the publication is "mypub2", and we > > cannot > > find pub->oid in pubids (only oid of "mypub1"), so it will set all > > entry->pubactions > > to false, and nothing will be replicate to subscriber. > > > > My apologies! > > As I mentioned in previous thread, if there has multiple publications in > single subscription, it might lead data loss. > > When the entry got invalidated in rel_sync_cache_publication_cb(), I think we > should mark the pubactions to false, and let get_rel_sync_entry() recalculate > the pubactions. > > 0001 - modify as above described. > 0002 - do not change. > > Any thought? > That sounds like a better way to fix and in fact, I was about to suggest the same after reading your previous email. I'll think more on this but in the meantime, can you add the test case in the patch as requested earlier as well. -- With Regards, Amit Kapila.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Fri, 15 Jan 2021 at 15:49, japin wrote: > On Fri, 15 Jan 2021 at 14:50, Bharath Rupireddy > wrote: >> On Fri, Jan 15, 2021 at 11:33 AM Hou, Zhijie >> wrote: >>> >>> > On Thu, Jan 14, 2021 at 5:36 PM Li Japin wrote >>> > > Do we really need to access PUBLICATIONRELMAP in this patch? What if >>> > > we just set it to false in the else condition of (if (publish && >>> > > (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))) >>> > > >>> > > Thank for you review. I agree with you, it doesn’t need to access >>> > > PUBLICATIONRELMAP, since We already get the publication oid in >>> > > GetRelationPublications(relid) [1], which also access >>> > > PUBLICATIONRELMAP. If the current pub->oid does not in pubids, the >>> > publish is false, so we do not need publish the table. >>> > >>> > +1. This is enough. >>> > >>> > > I have another question, the data->publications is a list, when did it >>> > has more then one items? >>> > >>> > IIUC, when the single table is associated with multiple publications, then >>> > data->publications will have multiple entries. Though I have not tried, >>> > we can try having two or three publications for the same table and verify >>> > that. >>> > >>> > > 0001 - change as you suggest. >>> > > 0002 - does not change. >>> >>> >>> Hi >>> >>> In 0001 patch >>> >>> The comments says " Relation is not associated with the publication anymore >>> " >>> That comment was accurate when check is_relation_part_of_publication() in >>> the last patch. >>> >>> But when put the code in the else branch, not only the case in the >>> comments(Relation is dropped), >>> Some other case such as "partitioned tables" will hit else branch too. >>> >>> Do you think we need fix the comment a little to make it accurate? >> >> Thanks, that comment needs a rework, in fact the else condition >> also(because we may fall into else branch even when publish is true >> and (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot) is >> false, but our intention(to fix the bug reported here) is to have the >> else condition only when publish is false. And also instead of just >> relying on the publish variable which can get set even when the >> relation is not in the publication but ancestor_published is true, we >> can just have something like below to fix the bug reported here: >> >> if (publish && >> (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)) >> { >> entry->pubactions.pubinsert |= pub->pubactions.pubinsert; >> entry->pubactions.pubupdate |= pub->pubactions.pubupdate; >> entry->pubactions.pubdelete |= pub->pubactions.pubdelete; >> entry->pubactions.pubtruncate |= pub->pubactions.pubtruncate; >> } >> else if (!list_member_oid(pubids, pub->oid)) >> { >> /* >> * Relation is not associated with the publication anymore >> i.e >> * it would have been dropped from the publication. So we >> don't >> * need to publish the changes for it. >> */ >> entry->pubactions.pubinsert = false; >> entry->pubactions.pubupdate = false; >> entry->pubactions.pubdelete = false; >> entry->pubactions.pubtruncate = false; >> } >> >> So this way, the fix only focuses on what we have reported here and it >> doesn't cause any regressions may be, and the existing comment becomes >> appropriate. >> >> Thoughts? >> > > I'm sorry, the 0001 patch is totally wrong. If we have only one publication, > it works, > however, this is a special case. If we have multiple publication in one > subscription, > it doesn't work. Here is a usecase. > > Step (1) > -- On publisher > CREATE TABLE t1 (a int); > CREATE TABLE t2 (a int); > CREATE PUBLICATION mypub1 FOR TABLE t1 WITH (publish='insert'); > CREATE PUBLICATION mypub2 FOR TABLE t2; > > Step (2) > -- On subscriber > CREATE TABLE t1 (a int); > CREATE TABLE t2 (a int); > CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432 > dbname=postgres' PUBLICATION mypub1, mypub2; > > Step (3) > -- On publisher > INSERT INTO t1 VALUES (1); > > Step (4) > -- On subscriber > SELECT * FROM t1; > > In step (4), we cannot get the records, because there has two publications in > data->publications, so we will check one by one. > The first publication is "mypub1" and entry->pubactions.pubinsert will be set > true, however, in the second round, the publication is "mypub2", and we cannot > find pub->oid in pubids (only oid of "mypub1"), so it will set all > entry->pubactions > to false, and nothing will be replicate to subscriber. > > My apologies! As I mentioned in previous thread, if there has multiple publications in single subscription, it might lead data loss. When the entry got invalidated in rel_sync_cache_publication_cb(), I think we should mark the pubactions to false, and let get
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Fri, 15 Jan 2021 at 14:50, Bharath Rupireddy wrote: > On Fri, Jan 15, 2021 at 11:33 AM Hou, Zhijie > wrote: >> >> > On Thu, Jan 14, 2021 at 5:36 PM Li Japin wrote >> > > Do we really need to access PUBLICATIONRELMAP in this patch? What if >> > > we just set it to false in the else condition of (if (publish && >> > > (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))) >> > > >> > > Thank for you review. I agree with you, it doesn’t need to access >> > > PUBLICATIONRELMAP, since We already get the publication oid in >> > > GetRelationPublications(relid) [1], which also access >> > > PUBLICATIONRELMAP. If the current pub->oid does not in pubids, the >> > publish is false, so we do not need publish the table. >> > >> > +1. This is enough. >> > >> > > I have another question, the data->publications is a list, when did it >> > has more then one items? >> > >> > IIUC, when the single table is associated with multiple publications, then >> > data->publications will have multiple entries. Though I have not tried, >> > we can try having two or three publications for the same table and verify >> > that. >> > >> > > 0001 - change as you suggest. >> > > 0002 - does not change. >> >> >> Hi >> >> In 0001 patch >> >> The comments says " Relation is not associated with the publication anymore " >> That comment was accurate when check is_relation_part_of_publication() in >> the last patch. >> >> But when put the code in the else branch, not only the case in the >> comments(Relation is dropped), >> Some other case such as "partitioned tables" will hit else branch too. >> >> Do you think we need fix the comment a little to make it accurate? > > Thanks, that comment needs a rework, in fact the else condition > also(because we may fall into else branch even when publish is true > and (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot) is > false, but our intention(to fix the bug reported here) is to have the > else condition only when publish is false. And also instead of just > relying on the publish variable which can get set even when the > relation is not in the publication but ancestor_published is true, we > can just have something like below to fix the bug reported here: > > if (publish && > (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)) > { > entry->pubactions.pubinsert |= pub->pubactions.pubinsert; > entry->pubactions.pubupdate |= pub->pubactions.pubupdate; > entry->pubactions.pubdelete |= pub->pubactions.pubdelete; > entry->pubactions.pubtruncate |= pub->pubactions.pubtruncate; > } > else if (!list_member_oid(pubids, pub->oid)) > { > /* > * Relation is not associated with the publication anymore i.e > * it would have been dropped from the publication. So we > don't > * need to publish the changes for it. > */ > entry->pubactions.pubinsert = false; > entry->pubactions.pubupdate = false; > entry->pubactions.pubdelete = false; > entry->pubactions.pubtruncate = false; > } > > So this way, the fix only focuses on what we have reported here and it > doesn't cause any regressions may be, and the existing comment becomes > appropriate. > > Thoughts? > I'm sorry, the 0001 patch is totally wrong. If we have only one publication, it works, however, this is a special case. If we have multiple publication in one subscription, it doesn't work. Here is a usecase. Step (1) -- On publisher CREATE TABLE t1 (a int); CREATE TABLE t2 (a int); CREATE PUBLICATION mypub1 FOR TABLE t1 WITH (publish='insert'); CREATE PUBLICATION mypub2 FOR TABLE t2; Step (2) -- On subscriber CREATE TABLE t1 (a int); CREATE TABLE t2 (a int); CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION mypub1, mypub2; Step (3) -- On publisher INSERT INTO t1 VALUES (1); Step (4) -- On subscriber SELECT * FROM t1; In step (4), we cannot get the records, because there has two publications in data->publications, so we will check one by one. The first publication is "mypub1" and entry->pubactions.pubinsert will be set true, however, in the second round, the publication is "mypub2", and we cannot find pub->oid in pubids (only oid of "mypub1"), so it will set all entry->pubactions to false, and nothing will be replicate to subscriber. My apologies! -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Fri, Jan 15, 2021 at 11:33 AM Hou, Zhijie wrote: > > > On Thu, Jan 14, 2021 at 5:36 PM Li Japin wrote > > > Do we really need to access PUBLICATIONRELMAP in this patch? What if > > > we just set it to false in the else condition of (if (publish && > > > (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))) > > > > > > Thank for you review. I agree with you, it doesn’t need to access > > > PUBLICATIONRELMAP, since We already get the publication oid in > > > GetRelationPublications(relid) [1], which also access > > > PUBLICATIONRELMAP. If the current pub->oid does not in pubids, the > > publish is false, so we do not need publish the table. > > > > +1. This is enough. > > > > > I have another question, the data->publications is a list, when did it > > has more then one items? > > > > IIUC, when the single table is associated with multiple publications, then > > data->publications will have multiple entries. Though I have not tried, > > we can try having two or three publications for the same table and verify > > that. > > > > > 0001 - change as you suggest. > > > 0002 - does not change. > > > Hi > > In 0001 patch > > The comments says " Relation is not associated with the publication anymore " > That comment was accurate when check is_relation_part_of_publication() in the > last patch. > > But when put the code in the else branch, not only the case in the > comments(Relation is dropped), > Some other case such as "partitioned tables" will hit else branch too. > > Do you think we need fix the comment a little to make it accurate? Thanks, that comment needs a rework, in fact the else condition also(because we may fall into else branch even when publish is true and (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot) is false, but our intention(to fix the bug reported here) is to have the else condition only when publish is false. And also instead of just relying on the publish variable which can get set even when the relation is not in the publication but ancestor_published is true, we can just have something like below to fix the bug reported here: if (publish && (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)) { entry->pubactions.pubinsert |= pub->pubactions.pubinsert; entry->pubactions.pubupdate |= pub->pubactions.pubupdate; entry->pubactions.pubdelete |= pub->pubactions.pubdelete; entry->pubactions.pubtruncate |= pub->pubactions.pubtruncate; } else if (!list_member_oid(pubids, pub->oid)) { /* * Relation is not associated with the publication anymore i.e * it would have been dropped from the publication. So we don't * need to publish the changes for it. */ entry->pubactions.pubinsert = false; entry->pubactions.pubupdate = false; entry->pubactions.pubdelete = false; entry->pubactions.pubtruncate = false; } So this way, the fix only focuses on what we have reported here and it doesn't cause any regressions may be, and the existing comment becomes appropriate. Thoughts? > And it seems we can add a "continue;" in else branch. > Of course this it not necessary. As you said, it's not necessary because the following if condition will fail anyways. Having said that, if we add continue, then any future code that gets added after the following if condition will never get executed. Though the reasoning may sound silly, IMO let's not add the continue in the else branch. if (entry->pubactions.pubinsert && entry->pubactions.pubupdate && entry->pubactions.pubdelete && entry->pubactions.pubtruncate) break; With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Thu, Jan 14, 2021 at 10:53 AM Amit Kapila wrote: > > 0002 - Invalidates the relation map cache in subscriber syscache > > invalidation callbacks. Currently, I'm setting entry->state to > > SUBREL_STATE_UNKNOWN in the new invalidation function that's > > introduced logicalrep_relmap_invalidate, so that in the next call to > > logicalrep_rel_open the entry's state will be read from the system > > catalogues using GetSubscriptionRelState. If this is sound's bit > > strange, I can add a boolean invalid to LogicalRepRelMapEntry > > structure and set that here and in logicalrep_rel_open, I can have > > something like: > > if (entry->state != SUBREL_STATE_READY || entry->invalid) > > entry->state = GetSubscriptionRelState(MySubscription->oid, > >entry->localreloid, > >&entry->statelsn); > > > >if (entry->invalid) > > entry->invalid = false; > > > > So, the point of the second patch is that it good to have such a > thing, otherwise, we don't see any problem if we just use patch-0001, > right? I think if we fix the first-one, automatically, we will achieve > what we are trying to with the second-one because ultimately > logicalrep_relmap_update will be called due to Alter Pub... Drop table > .. step and it will mark the entry as SUBREL_STATE_UNKNOWN. On some further analysis, I found that logicalrep_relmap_update is called in subscribers for inserts, delets, updates and truncate statements on the dropped(from publication) relations in the publisher. If any alters to pg_subscription, then we invalidate the subscription in subscription_change_cb, maybe_reread_subscription subscription will take care of re-reading from the system catalogues, see apply_handle_->ensure_transaction -> maybe_reread_subscription. And we don't have any entry in LogicalRepRelMapEntry that gets affected because of changes to pg_subscription, so we don't need to invalidate the rel map cache entries in subscription_change_cb. If any alters to pg_subscription_rel, then there are two parameters in LogicalRepRelMapEntry structure, state and statelsn that may get affected. Changes to statelsn column is taken care of with the invalidation callback invalidate_syncing_table_states setting table_states_valid to true and process_syncing_tables->process_syncing_tables_for_apply. But, if state column is changed somehow (although I'm not quite sure how it can change to different states 'i', 'd', 's', 'r' after the initial table sync phase in logical replication, but we can pretty much do something like update pg_subscription_rel set srsubstate = 'd';), in this case invalidate_syncing_table_states gets called, but if we don't revalidation of relation map cache entries, they will have the old state value. So what I feel is at least we need the 0002 patch but with only invalidating the entries in invalidate_syncing_table_states not in subscription_change_cb, although I'm not able to back it up with any use case or bug as such. Thoughts? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
RE: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
> On Thu, Jan 14, 2021 at 5:36 PM Li Japin wrote > > Do we really need to access PUBLICATIONRELMAP in this patch? What if > > we just set it to false in the else condition of (if (publish && > > (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))) > > > > Thank for you review. I agree with you, it doesn’t need to access > > PUBLICATIONRELMAP, since We already get the publication oid in > > GetRelationPublications(relid) [1], which also access > > PUBLICATIONRELMAP. If the current pub->oid does not in pubids, the > publish is false, so we do not need publish the table. > > +1. This is enough. > > > I have another question, the data->publications is a list, when did it > has more then one items? > > IIUC, when the single table is associated with multiple publications, then > data->publications will have multiple entries. Though I have not tried, > we can try having two or three publications for the same table and verify > that. > > > 0001 - change as you suggest. > > 0002 - does not change. Hi In 0001 patch The comments says " Relation is not associated with the publication anymore " That comment was accurate when check is_relation_part_of_publication() in the last patch. But when put the code in the else branch, not only the case in the comments(Relation is dropped), Some other case such as "partitioned tables" will hit else branch too. Do you think we need fix the comment a little to make it accurate? And it seems we can add a "continue;" in else branch. Of course this it not necessary. Best regards, houzj
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Thu, Jan 14, 2021 at 5:36 PM Li Japin wrote: > > On Jan 14, 2021, at 1:25 PM, Amit Kapila wrote: > > Attaching following two patches: > > 0001 - Makes publisher to not publish the changes for the alter > publication ... dropped tables. Original patch is by japin, I added > comments, changed the function name and adjusted the code a bit. > > > Do we really need to access PUBLICATIONRELMAP in this patch? What if > we just set it to false in the else condition of (if (publish && > (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))) > > > Thank for you review. I agree with you, it doesn’t need to access > PUBLICATIONRELMAP, since > We already get the publication oid in GetRelationPublications(relid) [1], > which also access > PUBLICATIONRELMAP. If the current pub->oid does not in pubids, the publish > is false, so we > do not need publish the table. > > I have another question, the data->publications is a list, when did it has > more then one items? > > 0001 - change as you suggest. > Can we add a test case for this patch as this seems to be a reproducible bug? We can add the test in src/test/subscription/100_bugs.pl unless you find a better place to add it. -- With Regards, Amit Kapila.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
> On Jan 14, 2021, at 8:44 PM, Amit Kapila wrote: > > On Thu, Jan 14, 2021 at 6:02 PM japin wrote: >> >> On Thu, 14 Jan 2021 at 20:19, Bharath Rupireddy wrote: >>> On Thu, Jan 14, 2021 at 5:36 PM Li Japin wrote Do we really need to access PUBLICATIONRELMAP in this patch? What if we just set it to false in the else condition of (if (publish && (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))) Thank for you review. I agree with you, it doesn’t need to access PUBLICATIONRELMAP, since We already get the publication oid in GetRelationPublications(relid) [1], which also access PUBLICATIONRELMAP. If the current pub->oid does not in pubids, the publish is false, so we do not need publish the table. >>> >>> +1. This is enough. >>> I have another question, the data->publications is a list, when did it has more then one items? >>> >>> IIUC, when the single table is associated with multiple publications, >>> then data->publications will have multiple entries. Though I have not >>> tried, we can try having two or three publications for the same table >>> and verify that. >>> >> >> I try add one table into two publications, but the data->publications has >> only >> one item. Is there something I missed? >> > > I think you will have multiple publications in that list when the > subscriber has subscribed to multiple publications. For example, > Create Subscription ... Publication pub1, Publication pub2. > Thanks, you are right! When we create a subscription with multiple publications, the data->publications has more then one items. -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Thu, Jan 14, 2021 at 6:02 PM japin wrote: > > On Thu, 14 Jan 2021 at 20:19, Bharath Rupireddy wrote: > > On Thu, Jan 14, 2021 at 5:36 PM Li Japin wrote > >> Do we really need to access PUBLICATIONRELMAP in this patch? What if > >> we just set it to false in the else condition of (if (publish && > >> (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))) > >> > >> Thank for you review. I agree with you, it doesn’t need to access > >> PUBLICATIONRELMAP, since > >> We already get the publication oid in GetRelationPublications(relid) [1], > >> which also access > >> PUBLICATIONRELMAP. If the current pub->oid does not in pubids, the > >> publish is false, so we > >> do not need publish the table. > > > > +1. This is enough. > > > >> I have another question, the data->publications is a list, when did it has > >> more then one items? > > > > IIUC, when the single table is associated with multiple publications, > > then data->publications will have multiple entries. Though I have not > > tried, we can try having two or three publications for the same table > > and verify that. > > > > I try add one table into two publications, but the data->publications has only > one item. Is there something I missed? > I think you will have multiple publications in that list when the subscriber has subscribed to multiple publications. For example, Create Subscription ... Publication pub1, Publication pub2. -- With Regards, Amit Kapila.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Thu, 14 Jan 2021 at 20:19, Bharath Rupireddy wrote: > On Thu, Jan 14, 2021 at 5:36 PM Li Japin wrote >> Do we really need to access PUBLICATIONRELMAP in this patch? What if >> we just set it to false in the else condition of (if (publish && >> (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))) >> >> Thank for you review. I agree with you, it doesn’t need to access >> PUBLICATIONRELMAP, since >> We already get the publication oid in GetRelationPublications(relid) [1], >> which also access >> PUBLICATIONRELMAP. If the current pub->oid does not in pubids, the publish >> is false, so we >> do not need publish the table. > > +1. This is enough. > >> I have another question, the data->publications is a list, when did it has >> more then one items? > > IIUC, when the single table is associated with multiple publications, > then data->publications will have multiple entries. Though I have not > tried, we can try having two or three publications for the same table > and verify that. > I try add one table into two publications, but the data->publications has only one item. Is there something I missed? -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Thu, Jan 14, 2021 at 5:36 PM Li Japin wrote > Do we really need to access PUBLICATIONRELMAP in this patch? What if > we just set it to false in the else condition of (if (publish && > (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))) > > Thank for you review. I agree with you, it doesn’t need to access > PUBLICATIONRELMAP, since > We already get the publication oid in GetRelationPublications(relid) [1], > which also access > PUBLICATIONRELMAP. If the current pub->oid does not in pubids, the publish > is false, so we > do not need publish the table. +1. This is enough. > I have another question, the data->publications is a list, when did it has > more then one items? IIUC, when the single table is associated with multiple publications, then data->publications will have multiple entries. Though I have not tried, we can try having two or three publications for the same table and verify that. > 0001 - change as you suggest. > 0002 - does not change. Thanks for the updated patch. I will respond to Amit's previous comment on the 0002 patch soon. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Jan 14, 2021, at 1:25 PM, Amit Kapila mailto:amit.kapil...@gmail.com>> wrote: On Wed, Jan 13, 2021 at 5:40 PM Bharath Rupireddy mailto:bharath.rupireddyforpostg...@gmail.com>> wrote: On Wed, Jan 13, 2021 at 3:16 PM Bharath Rupireddy mailto:bharath.rupireddyforpostg...@gmail.com>> wrote: On Wed, Jan 13, 2021 at 2:36 PM japin mailto:japi...@hotmail.com>> wrote: In summary, I feel we need to fix the publisher sending the inserts even though the table is dropped from the publication, that is the patch patch proposed by japin. This solves the bug reported in this thread. And also, it's good to have the LogicalRepRelMap invalidation fix as a 0002 patch in invalidate_syncing_table_states, subscription_change_cb and logicalrep_rel_open as proposed by me. Thoughts? I think invalidate the LogicalRepRelMap is necessary. If the table isn't in subscription, can we remove the LogicalRepRelMapEntry from LogicalRepRelMap? IIUC, it's not possible to know the relid of the alter publication...dropped table in the invalidation callbacks invalidate_syncing_table_states or subscription_change_cb, so it's better to set state of all the entries to SUBREL_STATE_UNKNOWN, so that, in the next logicalrep_rel_open call the function GetSubscriptionRelState will be called and the pg_subscription_rel will be read freshly. This is inline with the reasoning specified at [1]. [1] - https://www.postgresql.org/message-id/CAFiTN-udwuc_h0TaFrSEKb-Yo1vVvkjz9TDRw7VE3P-KiPSGbQ%40mail.gmail.com Attaching following two patches: 0001 - Makes publisher to not publish the changes for the alter publication ... dropped tables. Original patch is by japin, I added comments, changed the function name and adjusted the code a bit. Do we really need to access PUBLICATIONRELMAP in this patch? What if we just set it to false in the else condition of (if (publish && (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))) Thank for you review. I agree with you, it doesn’t need to access PUBLICATIONRELMAP, since We already get the publication oid in GetRelationPublications(relid) [1], which also access PUBLICATIONRELMAP. If the current pub->oid does not in pubids, the publish is false, so we do not need publish the table. I have another question, the data->publications is a list, when did it has more then one items? 0001 - change as you suggest. 0002 - does not change. Please consider v2 for further review. [1] List * GetRelationPublications(Oid relid) { List *result = NIL; CatCList *pubrellist; int i; /* Find all publications associated with the relation. */ pubrellist = SearchSysCacheList1(PUBLICATIONRELMAP, ObjectIdGetDatum(relid)); for (i = 0; i < pubrellist->n_members; i++) { HeapTuple tup = &pubrellist->members[i]->tuple; Oid pubid = ((Form_pg_publication_rel) GETSTRUCT(tup))->prpubid; result = lappend_oid(result, pubid); } ReleaseSysCacheList(pubrellist); return result; } -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd. v2-0001-Fix-ALTER-PUBLICATION.DROP-TABLE-behaviour.patch Description: v2-0001-Fix-ALTER-PUBLICATION.DROP-TABLE-behaviour.patch v2-0002-Invalidate-relation-map-cache-in-subscriber-sysca.patch Description: v2-0002-Invalidate-relation-map-cache-in-subscriber-sysca.patch
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Wed, Jan 13, 2021 at 5:40 PM Bharath Rupireddy wrote: > > On Wed, Jan 13, 2021 at 3:16 PM Bharath Rupireddy > wrote: > > > > On Wed, Jan 13, 2021 at 2:36 PM japin wrote: > > > > In summary, I feel we need to fix the publisher sending the inserts > > > > even though the table is dropped from the publication, that is the > > > > patch patch proposed by japin. This solves the bug reported in this > > > > thread. > > > > > > > > And also, it's good to have the LogicalRepRelMap invalidation fix as a > > > > 0002 patch in invalidate_syncing_table_states, subscription_change_cb > > > > and logicalrep_rel_open as proposed by me. > > > > > > > > Thoughts? > > > > > > > > > > I think invalidate the LogicalRepRelMap is necessary. If the table isn't > > > in > > > subscription, can we remove the LogicalRepRelMapEntry from > > > LogicalRepRelMap? > > > > IIUC, it's not possible to know the relid of the alter > > publication...dropped table in the invalidation callbacks > > invalidate_syncing_table_states or subscription_change_cb, so it's > > better to set state of all the entries to SUBREL_STATE_UNKNOWN, so > > that, in the next logicalrep_rel_open call the function > > GetSubscriptionRelState will be called and the pg_subscription_rel > > will be read freshly. > > > > This is inline with the reasoning specified at [1]. > > > > [1] - > > https://www.postgresql.org/message-id/CAFiTN-udwuc_h0TaFrSEKb-Yo1vVvkjz9TDRw7VE3P-KiPSGbQ%40mail.gmail.com > > Attaching following two patches: > > 0001 - Makes publisher to not publish the changes for the alter > publication ... dropped tables. Original patch is by japin, I added > comments, changed the function name and adjusted the code a bit. > Do we really need to access PUBLICATIONRELMAP in this patch? What if we just set it to false in the else condition of (if (publish && (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))) > 0002 - Invalidates the relation map cache in subscriber syscache > invalidation callbacks. Currently, I'm setting entry->state to > SUBREL_STATE_UNKNOWN in the new invalidation function that's > introduced logicalrep_relmap_invalidate, so that in the next call to > logicalrep_rel_open the entry's state will be read from the system > catalogues using GetSubscriptionRelState. If this is sound's bit > strange, I can add a boolean invalid to LogicalRepRelMapEntry > structure and set that here and in logicalrep_rel_open, I can have > something like: > if (entry->state != SUBREL_STATE_READY || entry->invalid) > entry->state = GetSubscriptionRelState(MySubscription->oid, >entry->localreloid, >&entry->statelsn); > >if (entry->invalid) > entry->invalid = false; > So, the point of the second patch is that it good to have such a thing, otherwise, we don't see any problem if we just use patch-0001, right? I think if we fix the first-one, automatically, we will achieve what we are trying to with the second-one because ultimately logicalrep_relmap_update will be called due to Alter Pub... Drop table .. step and it will mark the entry as SUBREL_STATE_UNKNOWN. > make check and make check-world passes on both patches. > > If okay with the fixes, I will try to add a test case and also a cf > entry so that these patches get a chance to be tested on all the > platforms. > I think it is good to follow both the steps (adding a testcase and registering it to CF). -- With Regards, Amit Kapila.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Wed, Jan 13, 2021 at 3:16 PM Bharath Rupireddy wrote: > > On Wed, Jan 13, 2021 at 2:36 PM japin wrote: > > > In summary, I feel we need to fix the publisher sending the inserts > > > even though the table is dropped from the publication, that is the > > > patch patch proposed by japin. This solves the bug reported in this > > > thread. > > > > > > And also, it's good to have the LogicalRepRelMap invalidation fix as a > > > 0002 patch in invalidate_syncing_table_states, subscription_change_cb > > > and logicalrep_rel_open as proposed by me. > > > > > > Thoughts? > > > > > > > I think invalidate the LogicalRepRelMap is necessary. If the table isn't in > > subscription, can we remove the LogicalRepRelMapEntry from LogicalRepRelMap? > > IIUC, it's not possible to know the relid of the alter > publication...dropped table in the invalidation callbacks > invalidate_syncing_table_states or subscription_change_cb, so it's > better to set state of all the entries to SUBREL_STATE_UNKNOWN, so > that, in the next logicalrep_rel_open call the function > GetSubscriptionRelState will be called and the pg_subscription_rel > will be read freshly. > > This is inline with the reasoning specified at [1]. > > [1] - > https://www.postgresql.org/message-id/CAFiTN-udwuc_h0TaFrSEKb-Yo1vVvkjz9TDRw7VE3P-KiPSGbQ%40mail.gmail.com Attaching following two patches: 0001 - Makes publisher to not publish the changes for the alter publication ... dropped tables. Original patch is by japin, I added comments, changed the function name and adjusted the code a bit. 0002 - Invalidates the relation map cache in subscriber syscache invalidation callbacks. Currently, I'm setting entry->state to SUBREL_STATE_UNKNOWN in the new invalidation function that's introduced logicalrep_relmap_invalidate, so that in the next call to logicalrep_rel_open the entry's state will be read from the system catalogues using GetSubscriptionRelState. If this is sound's bit strange, I can add a boolean invalid to LogicalRepRelMapEntry structure and set that here and in logicalrep_rel_open, I can have something like: if (entry->state != SUBREL_STATE_READY || entry->invalid) entry->state = GetSubscriptionRelState(MySubscription->oid, entry->localreloid, &entry->statelsn); if (entry->invalid) entry->invalid = false; make check and make check-world passes on both patches. If okay with the fixes, I will try to add a test case and also a cf entry so that these patches get a chance to be tested on all the platforms. Thoughts? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com v1-0001-Fix-ALTER-PUBLICATION.DROP-TABLE-behaviour.patch Description: Binary data v1-0002-Invalidate-relation-map-cache-in-subscriber-sysca.patch Description: Binary data
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Wed, Jan 13, 2021 at 5:15 PM Dilip Kumar wrote: > > On Wed, Jan 13, 2021 at 3:08 PM Bharath Rupireddy > wrote: > > > > On Wed, Jan 13, 2021 at 2:53 PM Dilip Kumar wrote: > > > > IIUC the logical replication only replicate the tables in publication, > > > > I think > > > > when the tables that aren't in publication should not be replicated. > > > > > > > > Attached the patch that fixes it. Thought? > > > > > > > > > > Instead of doing this, I would expect that the RelationSyncCache entry > > > should be removed when the relation is dropped from the publication. > > > So if that is done then it will reload the publication and then it > > > will not find that that relation as published and it will ignore the > > > changes. But the patch doesn't seem to be exactly on that line. Am I > > > missing something here? > > > > IIUC, it's not possible to remove the cache entry inside > > rel_sync_cache_publication_cb, because we don't receive the relid of > > the alter publication .. dropped relation in the invalidation > > callback. See the below comment, > > > > /* > > * There is no way to find which entry in our cache the hash belongs to > > so > > * mark the whole cache as invalid. > > */ > > hash_seq_init(&status, RelationSyncCache); > > while ((entry = (RelationSyncEntry *) hash_seq_search(&status)) != NULL) > > entry->replicate_valid = false; > > So basically your point is that all the RelationSyncEntry are getting > invalidated right? And if so then in get_rel_sync_entry we will just > check the updated publication no? > I am referring to the below code. So ideally this should be already > working if the relcaches are getting invalidated? > > get_rel_sync_entry > { > entry = (RelationSyncEntry *) hash_search(RelationSyncCache, > (void *) &relid, > HASH_ENTER, &found); > Assert(entry != NULL); > > . > > /* Validate the entry */ > if (!entry->replicate_valid) > { > List *pubids = GetRelationPublications(relid); > ListCell *lc; > Oid publish_as_relid = relid; > ... here we will rechek the pulication > } Yes, after the entries got invalidated in rel_sync_cache_publication_cb, we get to the if (!entry->replicate_valid) part of the code in get_rel_sync_entry, but in below point, we don't mark the pubactions to false, which were earlier set to true. Because of this, we will still have pubactions to true because of which pgoutput_change publishes the changes. /* * Don't publish changes for partitioned tables, because * publishing those of its partitions suffices, unless partition * changes won't be published due to pubviaroot being set. */ if (publish && (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)) { entry->pubactions.pubinsert |= pub->pubactions.pubinsert; entry->pubactions.pubupdate |= pub->pubactions.pubupdate; entry->pubactions.pubdelete |= pub->pubactions.pubdelete; entry->pubactions.pubtruncate |= pub->pubactions.pubtruncate; } if (entry->pubactions.pubinsert && entry->pubactions.pubupdate && entry->pubactions.pubdelete && entry->pubactions.pubtruncate) break; With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Wed, Jan 13, 2021 at 3:08 PM Bharath Rupireddy wrote: > > On Wed, Jan 13, 2021 at 2:53 PM Dilip Kumar wrote: > > > IIUC the logical replication only replicate the tables in publication, I > > > think > > > when the tables that aren't in publication should not be replicated. > > > > > > Attached the patch that fixes it. Thought? > > > > > > > Instead of doing this, I would expect that the RelationSyncCache entry > > should be removed when the relation is dropped from the publication. > > So if that is done then it will reload the publication and then it > > will not find that that relation as published and it will ignore the > > changes. But the patch doesn't seem to be exactly on that line. Am I > > missing something here? > > IIUC, it's not possible to remove the cache entry inside > rel_sync_cache_publication_cb, because we don't receive the relid of > the alter publication .. dropped relation in the invalidation > callback. See the below comment, > > /* > * There is no way to find which entry in our cache the hash belongs to so > * mark the whole cache as invalid. > */ > hash_seq_init(&status, RelationSyncCache); > while ((entry = (RelationSyncEntry *) hash_seq_search(&status)) != NULL) > entry->replicate_valid = false; So basically your point is that all the RelationSyncEntry are getting invalidated right? And if so then in get_rel_sync_entry we will just check the updated publication no? I am referring to the below code. So ideally this should be already working if the relcaches are getting invalidated? get_rel_sync_entry { entry = (RelationSyncEntry *) hash_search(RelationSyncCache, (void *) &relid, HASH_ENTER, &found); Assert(entry != NULL); . /* Validate the entry */ if (!entry->replicate_valid) { List *pubids = GetRelationPublications(relid); ListCell *lc; Oid publish_as_relid = relid; ... here we will rechek the pulication } -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Wed, Jan 13, 2021 at 2:36 PM japin wrote: > > In summary, I feel we need to fix the publisher sending the inserts > > even though the table is dropped from the publication, that is the > > patch patch proposed by japin. This solves the bug reported in this > > thread. > > > > And also, it's good to have the LogicalRepRelMap invalidation fix as a > > 0002 patch in invalidate_syncing_table_states, subscription_change_cb > > and logicalrep_rel_open as proposed by me. > > > > Thoughts? > > > > I think invalidate the LogicalRepRelMap is necessary. If the table isn't in > subscription, can we remove the LogicalRepRelMapEntry from LogicalRepRelMap? IIUC, it's not possible to know the relid of the alter publication...dropped table in the invalidation callbacks invalidate_syncing_table_states or subscription_change_cb, so it's better to set state of all the entries to SUBREL_STATE_UNKNOWN, so that, in the next logicalrep_rel_open call the function GetSubscriptionRelState will be called and the pg_subscription_rel will be read freshly. This is inline with the reasoning specified at [1]. [1] - https://www.postgresql.org/message-id/CAFiTN-udwuc_h0TaFrSEKb-Yo1vVvkjz9TDRw7VE3P-KiPSGbQ%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Wed, Jan 13, 2021 at 3:08 PM Bharath Rupireddy wrote: > > On Wed, Jan 13, 2021 at 2:53 PM Dilip Kumar wrote: > > > IIUC the logical replication only replicate the tables in publication, I > > > think > > > when the tables that aren't in publication should not be replicated. > > > > > > Attached the patch that fixes it. Thought? > > > > > > > Instead of doing this, I would expect that the RelationSyncCache entry > > should be removed when the relation is dropped from the publication. > > So if that is done then it will reload the publication and then it > > will not find that that relation as published and it will ignore the > > changes. But the patch doesn't seem to be exactly on that line. Am I > > missing something here? > > IIUC, it's not possible to remove the cache entry inside > rel_sync_cache_publication_cb, because we don't receive the relid of > the alter publication .. dropped relation in the invalidation > callback. See the below comment, Hmm, yeah because nothing changed to the relation the change is for the publication so the invalidation is not registered for the relation entry. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Wed, Jan 13, 2021 at 3:06 PM japin wrote: > > > On Wed, 13 Jan 2021 at 17:23, Dilip Kumar wrote: > > On Tue, Jan 12, 2021 at 4:47 PM Li Japin wrote: > >> > >> > >> On Jan 12, 2021, at 5:47 PM, japin wrote: > >> > >> > >> On Tue, 12 Jan 2021 at 14:38, Amit Kapila wrote: > >> > >> On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy > >> wrote: > >> > >> > >> On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila > >> wrote: > >> > >> > >> On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy > >> wrote: > >> > >> > >> Hi, > >> > >> While providing thoughts on the design in [1], I found a strange > >> behaviour with the $subject. The use case is shown below as a sequence > >> of steps that need to be run on publisher and subscriber to arrive at > >> the strange behaviour. In step 5, the table is dropped from the > >> publication and in step 6, the refresh publication is run on the > >> subscriber, from here onwards, the expectation is that no further > >> inserts into the publisher table have to be replicated on to the > >> subscriber, but the opposite happens i.e. the inserts are still > >> replicated to the subscriber. ISTM as a bug. Let me know if I'm > >> missing anything. > >> > >> > >> Did you try to investigate what's going on? Can you please check what > >> is the behavior if, after step-5, you restart the subscriber and > >> separately try creating a new subscription (maybe on a different > >> server) for that publication after step-5 and see if that allows the > >> relation to be replicated? AFAIU, in AlterSubscription_refresh, we > >> remove such dropped rels and stop their corresponding apply workers > >> which should stop the further replication of such relations but that > >> doesn't seem to be happening in your case. > >> > >> > >> Here's my analysis: > >> 1) in the publisher, alter publication drop table successfully > >> removes(PublicationDropTables) the table from the catalogue > >> pg_publication_rel > >> 2) in the subscriber, alter subscription refresh publication > >> successfully removes the table from the catalogue pg_subscription_rel > >> (AlterSubscription_refresh->RemoveSubscriptionRel) > >> so far so good > >> > >> > >> Here, it should register the worker to stop on commit, and then on > >> commit it should call AtEOXact_ApplyLauncher to stop the apply worker. > >> Once the apply worker is stopped, the corresponding WALSender will > >> also be stopped. Something here is not happening as per expected > >> behavior. > >> > >> 3) after the insertion into the table in the publisher(remember that > >> it's dropped from the publication in (1)), the walsender process is > >> unable detect that the table has been dropped from the publication > >> i.e. it doesn't look at the pg_publication_rel catalogue or some > >> other, but it only does is_publishable_relation() check which returns > >> true in pgoutput_change(). Maybe the walsender should look at the > >> catalogue pg_publication_rel in is_publishable_relation()? > >> > >> > >> We must be somewhere checking pg_publication_rel before sending the > >> decoded change because otherwise, we would have sent the changes for > >> the table which are not even part of this publication. I think you can > >> try to create a separate table that is not part of the publication > >> under test and see how the changes for that are filtered. > >> > >> > >> I find that pgoutput_change() use a hash table RelationSyncCache to > >> cache the publication info for tables. When we drop tables from the > >> publication, the RelationSyncCache doesn't updated, so it replicate > >> records. > >> > >> > >> IIUC the logical replication only replicate the tables in publication, I > >> think > >> when the tables that aren't in publication should not be replicated. > >> > >> Attached the patch that fixes it. Thought? > >> > > > > Instead of doing this, I would expect that the RelationSyncCache entry > > should be removed when the relation is dropped from the publication. > > So if that is done then it will reload the publication and then it > > will not find that that relation as published and it will ignore the > > changes. But the patch doesn't seem to be exactly on that line. Am I > > missing something here? > > Even if we remove RelationSyncCache entry when the relation is dropped from > the publication, when the relation is change (insert/update), AFAIK it also > create an entry in RelationSyncCache, so removing the RelationSyncCache entry > is unnecessary. > That is true but the entry->replicate_valid will be false and that will be only set to true in get_rel_sync_entry. So now based on the new publication the pubaction will be updated ? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Wed, Jan 13, 2021 at 2:53 PM Dilip Kumar wrote: > > IIUC the logical replication only replicate the tables in publication, I > > think > > when the tables that aren't in publication should not be replicated. > > > > Attached the patch that fixes it. Thought? > > > > Instead of doing this, I would expect that the RelationSyncCache entry > should be removed when the relation is dropped from the publication. > So if that is done then it will reload the publication and then it > will not find that that relation as published and it will ignore the > changes. But the patch doesn't seem to be exactly on that line. Am I > missing something here? IIUC, it's not possible to remove the cache entry inside rel_sync_cache_publication_cb, because we don't receive the relid of the alter publication .. dropped relation in the invalidation callback. See the below comment, /* * There is no way to find which entry in our cache the hash belongs to so * mark the whole cache as invalid. */ hash_seq_init(&status, RelationSyncCache); while ((entry = (RelationSyncEntry *) hash_seq_search(&status)) != NULL) entry->replicate_valid = false; With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Wed, 13 Jan 2021 at 17:23, Dilip Kumar wrote: > On Tue, Jan 12, 2021 at 4:47 PM Li Japin wrote: >> >> >> On Jan 12, 2021, at 5:47 PM, japin wrote: >> >> >> On Tue, 12 Jan 2021 at 14:38, Amit Kapila wrote: >> >> On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy >> wrote: >> >> >> On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila wrote: >> >> >> On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy >> wrote: >> >> >> Hi, >> >> While providing thoughts on the design in [1], I found a strange >> behaviour with the $subject. The use case is shown below as a sequence >> of steps that need to be run on publisher and subscriber to arrive at >> the strange behaviour. In step 5, the table is dropped from the >> publication and in step 6, the refresh publication is run on the >> subscriber, from here onwards, the expectation is that no further >> inserts into the publisher table have to be replicated on to the >> subscriber, but the opposite happens i.e. the inserts are still >> replicated to the subscriber. ISTM as a bug. Let me know if I'm >> missing anything. >> >> >> Did you try to investigate what's going on? Can you please check what >> is the behavior if, after step-5, you restart the subscriber and >> separately try creating a new subscription (maybe on a different >> server) for that publication after step-5 and see if that allows the >> relation to be replicated? AFAIU, in AlterSubscription_refresh, we >> remove such dropped rels and stop their corresponding apply workers >> which should stop the further replication of such relations but that >> doesn't seem to be happening in your case. >> >> >> Here's my analysis: >> 1) in the publisher, alter publication drop table successfully >> removes(PublicationDropTables) the table from the catalogue >> pg_publication_rel >> 2) in the subscriber, alter subscription refresh publication >> successfully removes the table from the catalogue pg_subscription_rel >> (AlterSubscription_refresh->RemoveSubscriptionRel) >> so far so good >> >> >> Here, it should register the worker to stop on commit, and then on >> commit it should call AtEOXact_ApplyLauncher to stop the apply worker. >> Once the apply worker is stopped, the corresponding WALSender will >> also be stopped. Something here is not happening as per expected >> behavior. >> >> 3) after the insertion into the table in the publisher(remember that >> it's dropped from the publication in (1)), the walsender process is >> unable detect that the table has been dropped from the publication >> i.e. it doesn't look at the pg_publication_rel catalogue or some >> other, but it only does is_publishable_relation() check which returns >> true in pgoutput_change(). Maybe the walsender should look at the >> catalogue pg_publication_rel in is_publishable_relation()? >> >> >> We must be somewhere checking pg_publication_rel before sending the >> decoded change because otherwise, we would have sent the changes for >> the table which are not even part of this publication. I think you can >> try to create a separate table that is not part of the publication >> under test and see how the changes for that are filtered. >> >> >> I find that pgoutput_change() use a hash table RelationSyncCache to >> cache the publication info for tables. When we drop tables from the >> publication, the RelationSyncCache doesn't updated, so it replicate >> records. >> >> >> IIUC the logical replication only replicate the tables in publication, I >> think >> when the tables that aren't in publication should not be replicated. >> >> Attached the patch that fixes it. Thought? >> > > Instead of doing this, I would expect that the RelationSyncCache entry > should be removed when the relation is dropped from the publication. > So if that is done then it will reload the publication and then it > will not find that that relation as published and it will ignore the > changes. But the patch doesn't seem to be exactly on that line. Am I > missing something here? Even if we remove RelationSyncCache entry when the relation is dropped from the publication, when the relation is change (insert/update), AFAIK it also create an entry in RelationSyncCache, so removing the RelationSyncCache entry is unnecessary. -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Tue, Jan 12, 2021 at 4:47 PM Li Japin wrote: > > > On Jan 12, 2021, at 5:47 PM, japin wrote: > > > On Tue, 12 Jan 2021 at 14:38, Amit Kapila wrote: > > On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy > wrote: > > > On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila wrote: > > > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy > wrote: > > > Hi, > > While providing thoughts on the design in [1], I found a strange > behaviour with the $subject. The use case is shown below as a sequence > of steps that need to be run on publisher and subscriber to arrive at > the strange behaviour. In step 5, the table is dropped from the > publication and in step 6, the refresh publication is run on the > subscriber, from here onwards, the expectation is that no further > inserts into the publisher table have to be replicated on to the > subscriber, but the opposite happens i.e. the inserts are still > replicated to the subscriber. ISTM as a bug. Let me know if I'm > missing anything. > > > Did you try to investigate what's going on? Can you please check what > is the behavior if, after step-5, you restart the subscriber and > separately try creating a new subscription (maybe on a different > server) for that publication after step-5 and see if that allows the > relation to be replicated? AFAIU, in AlterSubscription_refresh, we > remove such dropped rels and stop their corresponding apply workers > which should stop the further replication of such relations but that > doesn't seem to be happening in your case. > > > Here's my analysis: > 1) in the publisher, alter publication drop table successfully > removes(PublicationDropTables) the table from the catalogue > pg_publication_rel > 2) in the subscriber, alter subscription refresh publication > successfully removes the table from the catalogue pg_subscription_rel > (AlterSubscription_refresh->RemoveSubscriptionRel) > so far so good > > > Here, it should register the worker to stop on commit, and then on > commit it should call AtEOXact_ApplyLauncher to stop the apply worker. > Once the apply worker is stopped, the corresponding WALSender will > also be stopped. Something here is not happening as per expected > behavior. > > 3) after the insertion into the table in the publisher(remember that > it's dropped from the publication in (1)), the walsender process is > unable detect that the table has been dropped from the publication > i.e. it doesn't look at the pg_publication_rel catalogue or some > other, but it only does is_publishable_relation() check which returns > true in pgoutput_change(). Maybe the walsender should look at the > catalogue pg_publication_rel in is_publishable_relation()? > > > We must be somewhere checking pg_publication_rel before sending the > decoded change because otherwise, we would have sent the changes for > the table which are not even part of this publication. I think you can > try to create a separate table that is not part of the publication > under test and see how the changes for that are filtered. > > > I find that pgoutput_change() use a hash table RelationSyncCache to > cache the publication info for tables. When we drop tables from the > publication, the RelationSyncCache doesn't updated, so it replicate > records. > > > IIUC the logical replication only replicate the tables in publication, I think > when the tables that aren't in publication should not be replicated. > > Attached the patch that fixes it. Thought? > Instead of doing this, I would expect that the RelationSyncCache entry should be removed when the relation is dropped from the publication. So if that is done then it will reload the publication and then it will not find that that relation as published and it will ignore the changes. But the patch doesn't seem to be exactly on that line. Am I missing something here? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Wed, 13 Jan 2021 at 16:49, Bharath Rupireddy wrote: > On Wed, Jan 13, 2021 at 11:27 AM Amit Kapila wrote: >> >> On Wed, Jan 13, 2021 at 11:08 AM Bharath Rupireddy >> wrote: >> > >> > On Wed, Jan 13, 2021 at 10:33 AM Amit Kapila >> > wrote: >> > > >> > > On Tue, Jan 12, 2021 at 5:23 PM japin wrote: >> > > > >> > > > On Tue, 12 Jan 2021 at 19:32, Bharath Rupireddy wrote: >> > > > > On Tue, Jan 12, 2021 at 4:47 PM Li Japin wrote: >> > > > >> IIUC the logical replication only replicate the tables in >> > > > >> publication, I think >> > > > >> when the tables that aren't in publication should not be replicated. >> > > > >> >> > > > >> Attached the patch that fixes it. Thought? >> > > > > >> > > > > With that change, we don't get the behaviour that's stated in the >> > > > > document - "The ADD TABLE and DROP TABLE clauses will add and remove >> > > > > one or more tables from the publication. Note that adding tables to a >> > > > > publication that is already subscribed to will require a ALTER >> > > > > SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side >> > > > > in >> > > > > order to become effective" - >> > > > > https://www.postgresql.org/docs/devel/sql-alterpublication.html. >> > > > > >> > > > >> > > > The documentation only emphasize adding tables to a publication, not >> > > > include dropping tables from a publication. >> > > > >> > > >> > > Right and I think that is ensured by the subscriber by calling >> > > should_apply_changes_for_rel() which won't return true unless the >> > > newly added relation is not synced via Refresh Publication. So, this >> > > means with or without this patch we should be sending the changes of >> > > the newly published table from the publisher? >> > >> > Oh, my bad, alter subscription...refresh publication is required only when >> > the tables are added to the publisher. Patch by japin makes the walsender >> > process to stop sending the data to the subscriber/apply worker. The patch >> > is based on the idea of looking at the PUBLICATIONRELMAP in >> > get_rel_sync_entry when the entries have been invalidated in >> > rel_sync_cache_publication_cb because of alter publication...drop table. >> > , >> > When the alter subscription...refresh publication is run on the >> > subscriber, the SUBSCRIPTIONRELMAP catalogue gets invalidated but the >> > corresponding cache entries in the LogicalRepRelMap which is used by >> > logicalrep_rel_open are not invalidated. LogicalRepRelMap is used to know >> > the relations that are associated with the subscription. But we seem to >> > have not taken care of invalidating those entries, though we have the >> > invalidation callback invalidate_syncing_table_states registered for >> > SUBSCRIPTIONRELMAP in ApplyWorkerMain. So, we miss to work on updating the >> > entries in LogicalRepRelMap. >> > >> > IMO, the ideal way to fix this issue is 1) stop the walsender sending the >> > changes to dropped tables, for this japin patch works 2) we must mark all >> > the LogicalRepRelMap entries as invalid in invalidate_syncing_table_states >> > so that in the next logicalrep_rel_open, if the entry is invalidated, then >> > we have to call GetSubscriptionRelState to get the latest state, as shown >> > in [1]. Likewise, we might also have to mark the cache entries invalid in >> > subscription_change_cb which is invalidation callback for pg_subscription >> > >> >> Is the second point you described here is related to the original >> bug-reported or will it cause some other problem? > > It can cause some other problems. I found this while investigating > from the subscriber perspective why the subscriber is accepting the > inserts even though the relation is removed from the > pg_subscription_rel catalogue after refresh publication. I ended up > finding that the cache entries are not being invalidated in > invalidate_syncing_table_states. > > Having said that, the patch proposed by japin is enough to solve the > bug reported here. > > While we are fixing the bug, I thought it's better to fix this as > well, maybe as a 0002 patch? If okay, I can work on the patch and post > it in a separate thread? > >> > Thoughts? >> > >> > [1] - >> > if (entry->state != SUBREL_STATE_READY || entry->invalid) >> > entry->state = GetSubscriptionRelState(MySubscription->oid, >> >entry->localreloid, >> >&entry->statelsn); >> > >> >if (entry->invalid) >> >entry->invalid = false; >> > >> > return entry; >> > >> > > I have another question on your patch which is why in some cases like >> > > when we have not inserted in step-5 (as mentioned by you) the >> > > following insertions are not sent. Is somehow we are setting the >> > > pubactions as false in that case, if so, how? >> > >> > The reason is that the issue reported in this thread occurs - when we have >> > the walsender process running, RelationSyn
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Wed, Jan 13, 2021 at 11:27 AM Amit Kapila wrote: > > On Wed, Jan 13, 2021 at 11:08 AM Bharath Rupireddy > wrote: > > > > On Wed, Jan 13, 2021 at 10:33 AM Amit Kapila > > wrote: > > > > > > On Tue, Jan 12, 2021 at 5:23 PM japin wrote: > > > > > > > > On Tue, 12 Jan 2021 at 19:32, Bharath Rupireddy wrote: > > > > > On Tue, Jan 12, 2021 at 4:47 PM Li Japin wrote: > > > > >> IIUC the logical replication only replicate the tables in > > > > >> publication, I think > > > > >> when the tables that aren't in publication should not be replicated. > > > > >> > > > > >> Attached the patch that fixes it. Thought? > > > > > > > > > > With that change, we don't get the behaviour that's stated in the > > > > > document - "The ADD TABLE and DROP TABLE clauses will add and remove > > > > > one or more tables from the publication. Note that adding tables to a > > > > > publication that is already subscribed to will require a ALTER > > > > > SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side in > > > > > order to become effective" - > > > > > https://www.postgresql.org/docs/devel/sql-alterpublication.html. > > > > > > > > > > > > > The documentation only emphasize adding tables to a publication, not > > > > include dropping tables from a publication. > > > > > > > > > > Right and I think that is ensured by the subscriber by calling > > > should_apply_changes_for_rel() which won't return true unless the > > > newly added relation is not synced via Refresh Publication. So, this > > > means with or without this patch we should be sending the changes of > > > the newly published table from the publisher? > > > > Oh, my bad, alter subscription...refresh publication is required only when > > the tables are added to the publisher. Patch by japin makes the walsender > > process to stop sending the data to the subscriber/apply worker. The patch > > is based on the idea of looking at the PUBLICATIONRELMAP in > > get_rel_sync_entry when the entries have been invalidated in > > rel_sync_cache_publication_cb because of alter publication...drop table. > > , > > When the alter subscription...refresh publication is run on the subscriber, > > the SUBSCRIPTIONRELMAP catalogue gets invalidated but the corresponding > > cache entries in the LogicalRepRelMap which is used by logicalrep_rel_open > > are not invalidated. LogicalRepRelMap is used to know the relations that > > are associated with the subscription. But we seem to have not taken care of > > invalidating those entries, though we have the invalidation callback > > invalidate_syncing_table_states registered for SUBSCRIPTIONRELMAP in > > ApplyWorkerMain. So, we miss to work on updating the entries in > > LogicalRepRelMap. > > > > IMO, the ideal way to fix this issue is 1) stop the walsender sending the > > changes to dropped tables, for this japin patch works 2) we must mark all > > the LogicalRepRelMap entries as invalid in invalidate_syncing_table_states > > so that in the next logicalrep_rel_open, if the entry is invalidated, then > > we have to call GetSubscriptionRelState to get the latest state, as shown > > in [1]. Likewise, we might also have to mark the cache entries invalid in > > subscription_change_cb which is invalidation callback for pg_subscription > > > > Is the second point you described here is related to the original > bug-reported or will it cause some other problem? It can cause some other problems. I found this while investigating from the subscriber perspective why the subscriber is accepting the inserts even though the relation is removed from the pg_subscription_rel catalogue after refresh publication. I ended up finding that the cache entries are not being invalidated in invalidate_syncing_table_states. Having said that, the patch proposed by japin is enough to solve the bug reported here. While we are fixing the bug, I thought it's better to fix this as well, maybe as a 0002 patch? If okay, I can work on the patch and post it in a separate thread? > > Thoughts? > > > > [1] - > > if (entry->state != SUBREL_STATE_READY || entry->invalid) > > entry->state = GetSubscriptionRelState(MySubscription->oid, > >entry->localreloid, > >&entry->statelsn); > > > >if (entry->invalid) > >entry->invalid = false; > > > > return entry; > > > > > I have another question on your patch which is why in some cases like > > > when we have not inserted in step-5 (as mentioned by you) the > > > following insertions are not sent. Is somehow we are setting the > > > pubactions as false in that case, if so, how? > > > > The reason is that the issue reported in this thread occurs - when we have > > the walsender process running, RelationSyncCache is initialized, we > > inserted some data into the table that's sent to the subscriber and the > > table is dropped, we miss to set the pubactions to false in > > get_rel_
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Wed, 13 Jan 2021 at 13:26, Amit Kapila wrote: > On Tue, Jan 12, 2021 at 4:59 PM Bharath Rupireddy > wrote: >> >> On Tue, Jan 12, 2021 at 12:06 PM Amit Kapila wrote: >> > > Here's my analysis: >> > > 1) in the publisher, alter publication drop table successfully >> > > removes(PublicationDropTables) the table from the catalogue >> > > pg_publication_rel >> > > 2) in the subscriber, alter subscription refresh publication >> > > successfully removes the table from the catalogue pg_subscription_rel >> > > (AlterSubscription_refresh->RemoveSubscriptionRel) >> > > so far so good >> > > >> > >> > Here, it should register the worker to stop on commit, and then on >> > commit it should call AtEOXact_ApplyLauncher to stop the apply worker. >> > Once the apply worker is stopped, the corresponding WALSender will >> > also be stopped. Something here is not happening as per expected >> > behavior. >> >> On the subscriber, an entry for worker stop is created in >> AlterSubscription_refresh --> logicalrep_worker_stop_at_commit. At the end >> of txn, in AtEOXact_ApplyLauncher, we try to stop that worker, but it cannot >> be stopped because logicalrep_worker_find returns null >> (AtEOXact_ApplyLauncher --> logicalrep_worker_stop --> >> logicalrep_worker_find). The worker entry for that subscriber is having >> relid as 0 [1], due to which the following if condition will not be hit. The >> apply worker on the subscriber related to the subscription on which refresh >> publication was run is not closed. It looks like relid 0 is valid because it >> will be applicable only during the table sync phase, the comment in the >> LogicalRepWorker structure says that. >> >> And also, I think, expecting the apply worker to be closed this way doesn't >> make sense because the apply worker is a per-subscription base, and the >> subscription can have other tables too. >> > > Okay, that makes sense. As responded to Li Japin, let's focus on > figuring out why we are sending the changes from the publisher node in > some cases and not in other cases. After some analysis, I find that the dropped tables always replicate to subscriber. The difference is that if we drop the table from publication and refresh publication (on subscriber), the LogicalRepRelMapEntry in should_apply_changes_for_rel() set state to SUBREL_STATE_UNKNOWN. (gdb) p *rel $2 = {remoterel = {remoteid = 16410, nspname = 0x5564fb0177c0 "public", relname = 0x5564fb0177a0 "t1", natts = 1, attnames = 0x5564fb0177e0, atttyps = 0x5564fb017780, replident = 100 'd', relkind = 0 '\000', attkeys = 0x0}, localrelvalid = true, localreloid = 16412, localrel = 0x7f78705da1b8, attrmap = 0x5564fb017800, updatable = false, *state = 0 '\000'*, statelsn = 0} If we insert data between drop table from publication and refresh publication, the LogicalRepRelMapEntry state is always SUBREL_STATE_READY. (gdb) p *rel $2 = {remoterel = {remoteid = 16410, nspname = 0x5564fb0177c0 "public", relname = 0x5564fb0177a0 "t1", natts = 1, attnames = 0x5564fb0177e0, atttyps = 0x5564fb017780, replident = 100 'd', relkind = 0 '\000', attkeys = 0x0}, localrelvalid = true, localreloid = 16412, localrel = 0x7f78705d9d38, attrmap = 0x5564fb017800, updatable = false, *state = 114 'r'*, statelsn = 23545672} I will dig why the state of LogicalRepRelMapEntry doesn't change in second case. Any suggestion is welcome! -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Wed, Jan 13, 2021 at 11:08 AM Bharath Rupireddy wrote: > > On Wed, Jan 13, 2021 at 10:33 AM Amit Kapila wrote: > > > > On Tue, Jan 12, 2021 at 5:23 PM japin wrote: > > > > > > On Tue, 12 Jan 2021 at 19:32, Bharath Rupireddy wrote: > > > > On Tue, Jan 12, 2021 at 4:47 PM Li Japin wrote: > > > >> IIUC the logical replication only replicate the tables in publication, > > > >> I think > > > >> when the tables that aren't in publication should not be replicated. > > > >> > > > >> Attached the patch that fixes it. Thought? > > > > > > > > With that change, we don't get the behaviour that's stated in the > > > > document - "The ADD TABLE and DROP TABLE clauses will add and remove > > > > one or more tables from the publication. Note that adding tables to a > > > > publication that is already subscribed to will require a ALTER > > > > SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side in > > > > order to become effective" - > > > > https://www.postgresql.org/docs/devel/sql-alterpublication.html. > > > > > > > > > > The documentation only emphasize adding tables to a publication, not > > > include dropping tables from a publication. > > > > > > > Right and I think that is ensured by the subscriber by calling > > should_apply_changes_for_rel() which won't return true unless the > > newly added relation is not synced via Refresh Publication. So, this > > means with or without this patch we should be sending the changes of > > the newly published table from the publisher? > > Oh, my bad, alter subscription...refresh publication is required only when > the tables are added to the publisher. Patch by japin makes the walsender > process to stop sending the data to the subscriber/apply worker. The patch is > based on the idea of looking at the PUBLICATIONRELMAP in get_rel_sync_entry > when the entries have been invalidated in rel_sync_cache_publication_cb > because of alter publication...drop table. > , > When the alter subscription...refresh publication is run on the subscriber, > the SUBSCRIPTIONRELMAP catalogue gets invalidated but the corresponding cache > entries in the LogicalRepRelMap which is used by logicalrep_rel_open are not > invalidated. LogicalRepRelMap is used to know the relations that are > associated with the subscription. But we seem to have not taken care of > invalidating those entries, though we have the invalidation callback > invalidate_syncing_table_states registered for SUBSCRIPTIONRELMAP in > ApplyWorkerMain. So, we miss to work on updating the entries in > LogicalRepRelMap. > > IMO, the ideal way to fix this issue is 1) stop the walsender sending the > changes to dropped tables, for this japin patch works 2) we must mark all the > LogicalRepRelMap entries as invalid in invalidate_syncing_table_states so > that in the next logicalrep_rel_open, if the entry is invalidated, then we > have to call GetSubscriptionRelState to get the latest state, as shown in > [1]. Likewise, we might also have to mark the cache entries invalid in > subscription_change_cb which is invalidation callback for pg_subscription > Is the second point you described here is related to the original bug-reported or will it cause some other problem? > Thoughts? > > [1] - > if (entry->state != SUBREL_STATE_READY || entry->invalid) > entry->state = GetSubscriptionRelState(MySubscription->oid, >entry->localreloid, >&entry->statelsn); > >if (entry->invalid) >entry->invalid = false; > > return entry; > > > I have another question on your patch which is why in some cases like > > when we have not inserted in step-5 (as mentioned by you) the > > following insertions are not sent. Is somehow we are setting the > > pubactions as false in that case, if so, how? > > The reason is that the issue reported in this thread occurs - when we have > the walsender process running, RelationSyncCache is initialized, we inserted > some data into the table that's sent to the subscriber and the table is > dropped, we miss to set the pubactions to false in get_rel_sync_entry, though > the cache entries have been invalidated. > > In some cases, it works properly because the old walsender process was > stopped and when a new walsender is started, then the cache RelationSyncCache > gets initialized again and the pubactions will be set to false in > get_rel_sync_entry. > Why is walsender process was getting stopped in one case but not in another? -- With Regards, Amit Kapila.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Wed, Jan 13, 2021 at 10:33 AM Amit Kapila wrote: > > On Tue, Jan 12, 2021 at 5:23 PM japin wrote: > > > > On Tue, 12 Jan 2021 at 19:32, Bharath Rupireddy wrote: > > > On Tue, Jan 12, 2021 at 4:47 PM Li Japin wrote: > > >> IIUC the logical replication only replicate the tables in publication, I think > > >> when the tables that aren't in publication should not be replicated. > > >> > > >> Attached the patch that fixes it. Thought? > > > > > > With that change, we don't get the behaviour that's stated in the > > > document - "The ADD TABLE and DROP TABLE clauses will add and remove > > > one or more tables from the publication. Note that adding tables to a > > > publication that is already subscribed to will require a ALTER > > > SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side in > > > order to become effective" - > > > https://www.postgresql.org/docs/devel/sql-alterpublication.html. > > > > > > > The documentation only emphasize adding tables to a publication, not > > include dropping tables from a publication. > > > > Right and I think that is ensured by the subscriber by calling > should_apply_changes_for_rel() which won't return true unless the > newly added relation is not synced via Refresh Publication. So, this > means with or without this patch we should be sending the changes of > the newly published table from the publisher? Oh, my bad, alter subscription...refresh publication is required only when the tables are added to the publisher. Patch by japin makes the walsender process to stop sending the data to the subscriber/apply worker. The patch is based on the idea of looking at the PUBLICATIONRELMAP in get_rel_sync_entry when the entries have been invalidated in rel_sync_cache_publication_cb because of alter publication...drop table. , When the alter subscription...refresh publication is run on the subscriber, the SUBSCRIPTIONRELMAP catalogue gets invalidated but the corresponding cache entries in the LogicalRepRelMap which is used by logicalrep_rel_open are not invalidated. LogicalRepRelMap is used to know the relations that are associated with the subscription. But we seem to have not taken care of invalidating those entries, though we have the invalidation callback invalidate_syncing_table_states registered for SUBSCRIPTIONRELMAP in ApplyWorkerMain. So, we miss to work on updating the entries in LogicalRepRelMap. IMO, the ideal way to fix this issue is 1) stop the walsender sending the changes to dropped tables, for this japin patch works 2) we must mark all the LogicalRepRelMap entries as invalid in invalidate_syncing_table_states so that in the next logicalrep_rel_open, if the entry is invalidated, then we have to call GetSubscriptionRelState to get the latest state, as shown in [1]. Likewise, we might also have to mark the cache entries invalid in subscription_change_cb which is invalidation callback for pg_subscription Thoughts? [1] - if (entry->state != SUBREL_STATE_READY *|| entry->invalid*) entry->state = GetSubscriptionRelState(MySubscription->oid, entry->localreloid, &entry->statelsn); if (entry->invalid) entry->invalid = false; return entry; > I have another question on your patch which is why in some cases like > when we have not inserted in step-5 (as mentioned by you) the > following insertions are not sent. Is somehow we are setting the > pubactions as false in that case, if so, how? The reason is that the issue reported in this thread occurs - when we have the walsender process running, RelationSyncCache is initialized, we inserted some data into the table that's sent to the subscriber and the table is dropped, we miss to set the pubactions to false in get_rel_sync_entry, though the cache entries have been invalidated. In some cases, it works properly because the old walsender process was stopped and when a new walsender is started, then the cache RelationSyncCache gets initialized again and the pubactions will be set to false in get_rel_sync_entry. if (!found) { /* immediately make a new entry valid enough to satisfy callbacks */ entry->schema_sent = false; entry->streamed_txns = NIL; entry->replicate_valid = false; entry->pubactions.pubinsert = entry->pubactions.pubupdate = entry->pubactions.pubdelete = entry->pubactions.pubtruncate = false; entry->publish_as_relid = InvalidOid; } Hope that clarifies why the issue happens in some cases and not in other cases. > > > The publisher stops sending the tuples whenever the relation gets > > > dropped from the publication, not waiting until alter subscription ... > > > refresh publication on the subscriber. > > > > > > > If we want to wait the subscriber executing alter subscription ... refresh publication, > > maybe we should send some feedback to walsender. How can we send this feedback to > > walsender in no
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Tue, Jan 12, 2021 at 4:59 PM Bharath Rupireddy wrote: > > On Tue, Jan 12, 2021 at 12:06 PM Amit Kapila wrote: > > > Here's my analysis: > > > 1) in the publisher, alter publication drop table successfully > > > removes(PublicationDropTables) the table from the catalogue > > > pg_publication_rel > > > 2) in the subscriber, alter subscription refresh publication > > > successfully removes the table from the catalogue pg_subscription_rel > > > (AlterSubscription_refresh->RemoveSubscriptionRel) > > > so far so good > > > > > > > Here, it should register the worker to stop on commit, and then on > > commit it should call AtEOXact_ApplyLauncher to stop the apply worker. > > Once the apply worker is stopped, the corresponding WALSender will > > also be stopped. Something here is not happening as per expected > > behavior. > > On the subscriber, an entry for worker stop is created in > AlterSubscription_refresh --> logicalrep_worker_stop_at_commit. At the end of > txn, in AtEOXact_ApplyLauncher, we try to stop that worker, but it cannot be > stopped because logicalrep_worker_find returns null (AtEOXact_ApplyLauncher > --> logicalrep_worker_stop --> logicalrep_worker_find). The worker entry for > that subscriber is having relid as 0 [1], due to which the following if > condition will not be hit. The apply worker on the subscriber related to the > subscription on which refresh publication was run is not closed. It looks > like relid 0 is valid because it will be applicable only during the table > sync phase, the comment in the LogicalRepWorker structure says that. > > And also, I think, expecting the apply worker to be closed this way doesn't > make sense because the apply worker is a per-subscription base, and the > subscription can have other tables too. > Okay, that makes sense. As responded to Li Japin, let's focus on figuring out why we are sending the changes from the publisher node in some cases and not in other cases. -- With Regards, Amit Kapila.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Tue, Jan 12, 2021 at 5:23 PM japin wrote: > > On Tue, 12 Jan 2021 at 19:32, Bharath Rupireddy wrote: > > On Tue, Jan 12, 2021 at 4:47 PM Li Japin wrote: > >> IIUC the logical replication only replicate the tables in publication, I > >> think > >> when the tables that aren't in publication should not be replicated. > >> > >> Attached the patch that fixes it. Thought? > > > > With that change, we don't get the behaviour that's stated in the > > document - "The ADD TABLE and DROP TABLE clauses will add and remove > > one or more tables from the publication. Note that adding tables to a > > publication that is already subscribed to will require a ALTER > > SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side in > > order to become effective" - > > https://www.postgresql.org/docs/devel/sql-alterpublication.html. > > > > The documentation only emphasize adding tables to a publication, not > include dropping tables from a publication. > Right and I think that is ensured by the subscriber by calling should_apply_changes_for_rel() which won't return true unless the newly added relation is not synced via Refresh Publication. So, this means with or without this patch we should be sending the changes of the newly published table from the publisher? I have another question on your patch which is why in some cases like when we have not inserted in step-5 (as mentioned by you) the following insertions are not sent. Is somehow we are setting the pubactions as false in that case, if so, how? > > The publisher stops sending the tuples whenever the relation gets > > dropped from the publication, not waiting until alter subscription ... > > refresh publication on the subscriber. > > > > If we want to wait the subscriber executing alter subscription ... refresh > publication, > maybe we should send some feedback to walsender. How can we send this > feedback to > walsender in non-walreceiver process? > I don't think we need this if what I said above is correct. -- With Regards, Amit Kapila.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Tue, 12 Jan 2021 at 19:32, Bharath Rupireddy wrote: > On Tue, Jan 12, 2021 at 4:47 PM Li Japin wrote: >> IIUC the logical replication only replicate the tables in publication, I >> think >> when the tables that aren't in publication should not be replicated. >> >> Attached the patch that fixes it. Thought? > > With that change, we don't get the behaviour that's stated in the > document - "The ADD TABLE and DROP TABLE clauses will add and remove > one or more tables from the publication. Note that adding tables to a > publication that is already subscribed to will require a ALTER > SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side in > order to become effective" - > https://www.postgresql.org/docs/devel/sql-alterpublication.html. > The documentation only emphasize adding tables to a publication, not include dropping tables from a publication. > The publisher stops sending the tuples whenever the relation gets > dropped from the publication, not waiting until alter subscription ... > refresh publication on the subscriber. > If we want to wait the subscriber executing alter subscription ... refresh publication, maybe we should send some feedback to walsender. How can we send this feedback to walsender in non-walreceiver process? > With Regards, > Bharath Rupireddy. > EnterpriseDB: http://www.enterprisedb.com -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Tue, Jan 12, 2021 at 4:47 PM Li Japin wrote: > IIUC the logical replication only replicate the tables in publication, I think > when the tables that aren't in publication should not be replicated. > > Attached the patch that fixes it. Thought? With that change, we don't get the behaviour that's stated in the document - "The ADD TABLE and DROP TABLE clauses will add and remove one or more tables from the publication. Note that adding tables to a publication that is already subscribed to will require a ALTER SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side in order to become effective" - https://www.postgresql.org/docs/devel/sql-alterpublication.html. The publisher stops sending the tuples whenever the relation gets dropped from the publication, not waiting until alter subscription ... refresh publication on the subscriber. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Tue, Jan 12, 2021 at 12:06 PM Amit Kapila wrote: > > Here's my analysis: > > 1) in the publisher, alter publication drop table successfully > > removes(PublicationDropTables) the table from the catalogue > > pg_publication_rel > > 2) in the subscriber, alter subscription refresh publication > > successfully removes the table from the catalogue pg_subscription_rel > > (AlterSubscription_refresh->RemoveSubscriptionRel) > > so far so good > > > > Here, it should register the worker to stop on commit, and then on > commit it should call AtEOXact_ApplyLauncher to stop the apply worker. > Once the apply worker is stopped, the corresponding WALSender will > also be stopped. Something here is not happening as per expected > behavior. On the subscriber, an entry for worker stop is created in AlterSubscription_refresh --> logicalrep_worker_stop_at_commit. At the end of txn, in AtEOXact_ApplyLauncher, we try to stop that worker, but it cannot be stopped because logicalrep_worker_find returns null (AtEOXact_ApplyLauncher --> logicalrep_worker_stop --> logicalrep_worker_find). The worker entry for that subscriber is having relid as 0 [1], due to which the following if condition will not be hit. The apply worker on the subscriber related to the subscription on which refresh publication was run is not closed. It looks like relid 0 is valid because it will be applicable only during the table sync phase, the comment in the LogicalRepWorker structure says that. And also, I think, expecting the apply worker to be closed this way doesn't make sense because the apply worker is a per-subscription base, and the subscription can have other tables too. /* Search for attached worker for a given subscription id. */ for (i = 0; i < max_logical_replication_workers; i++) { LogicalRepWorker *w = &LogicalRepCtx->workers[i]; if (w->in_use && w->subid == subid && w->relid == relid && (!only_running || w->proc)) { res = w; break; } } [1] (gdb) p subid $5 = 16391 (gdb) p relid $6 = 16388 (gdb) p *w $4 = {launch_time = 663760343317760, in_use = true, generation = 1, proc = 0x7fdfd9a7cc90, dbid = 12872, userid = 10, *subid = 16391*,* relid = 0*, relstate = 0 '\000', relstate_lsn = 0, relmutex = 0 '\000', last_lsn = 22798424, last_send_time = 663760483945980, last_recv_time = 663760483946087, reply_lsn = 22798424, reply_time = 663760483945980} postgres=# select * from pg_stat_get_subscription(16391); subid | relid | pid | received_lsn |last_msg_send_time| last_msg_receipt_time | latest_end_lsn | latest_end_time ---+---++--+--+--++-- 16391 | | 466779 | 0/15BE140| 2021-01-12 15:26:48.778813+05:30 | 2021-01-12 15:26:48.778878+05:30 | 0/15BE140 | 2021-01-12 15:26:48.778813+05:30 (1 row) > > 3) after the insertion into the table in the publisher(remember that > > it's dropped from the publication in (1)), the walsender process is > > unable detect that the table has been dropped from the publication > > i.e. it doesn't look at the pg_publication_rel catalogue or some > > other, but it only does is_publishable_relation() check which returns > > true in pgoutput_change(). Maybe the walsender should look at the > > catalogue pg_publication_rel in is_publishable_relation()? > > > > We must be somewhere checking pg_publication_rel before sending the > decoded change because otherwise, we would have sent the changes for > the table which are not even part of this publication. I think you can > try to create a separate table that is not part of the publication > under test and see how the changes for that are filtered. As pointed out by japin in the next thread, the walsender process in the publisher uses RelationSyncCache to hold the relations to which the insertions need to be sent to the subscriber. The RelationSyncCache gets created during startup of the walsender process(pgoutput_startup->init_rel_sync_cache) and also rel_sync_cache_publication_cb callback gets registered. So, if any alters happen to the pg_publication_rel catalog table, the callback rel_sync_cache_publication_cb is called, all the entries in the RelationSyncCache are marked as invalid, with the expectation that on the next use of any cache entry in get_rel_sync_entry (pgoutput_change->get_rel_sync_entry), that entry is validated again. In the use case, the invalidation happens as expected in rel_sync_cache_publication_cb and while revalidating the entry in get_rel_sync_entry, since there is only one publication to which the given relation is attached to, the pubids will be null and we don't set the entry->pubactions.pubinsert/update/delete/truncate to false, so the publisher keeps publishing the inserts to the relation. /* Validate the entry */ if (!entry->replicate_valid) { List *pubids
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Jan 12, 2021, at 5:47 PM, japin mailto:japi...@hotmail.com>> wrote: On Tue, 12 Jan 2021 at 14:38, Amit Kapila wrote: On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy mailto:bharath.rupireddyforpostg...@gmail.com>> wrote: On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila mailto:amit.kapil...@gmail.com>> wrote: On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy mailto:bharath.rupireddyforpostg...@gmail.com>> wrote: Hi, While providing thoughts on the design in [1], I found a strange behaviour with the $subject. The use case is shown below as a sequence of steps that need to be run on publisher and subscriber to arrive at the strange behaviour. In step 5, the table is dropped from the publication and in step 6, the refresh publication is run on the subscriber, from here onwards, the expectation is that no further inserts into the publisher table have to be replicated on to the subscriber, but the opposite happens i.e. the inserts are still replicated to the subscriber. ISTM as a bug. Let me know if I'm missing anything. Did you try to investigate what's going on? Can you please check what is the behavior if, after step-5, you restart the subscriber and separately try creating a new subscription (maybe on a different server) for that publication after step-5 and see if that allows the relation to be replicated? AFAIU, in AlterSubscription_refresh, we remove such dropped rels and stop their corresponding apply workers which should stop the further replication of such relations but that doesn't seem to be happening in your case. Here's my analysis: 1) in the publisher, alter publication drop table successfully removes(PublicationDropTables) the table from the catalogue pg_publication_rel 2) in the subscriber, alter subscription refresh publication successfully removes the table from the catalogue pg_subscription_rel (AlterSubscription_refresh->RemoveSubscriptionRel) so far so good Here, it should register the worker to stop on commit, and then on commit it should call AtEOXact_ApplyLauncher to stop the apply worker. Once the apply worker is stopped, the corresponding WALSender will also be stopped. Something here is not happening as per expected behavior. 3) after the insertion into the table in the publisher(remember that it's dropped from the publication in (1)), the walsender process is unable detect that the table has been dropped from the publication i.e. it doesn't look at the pg_publication_rel catalogue or some other, but it only does is_publishable_relation() check which returns true in pgoutput_change(). Maybe the walsender should look at the catalogue pg_publication_rel in is_publishable_relation()? We must be somewhere checking pg_publication_rel before sending the decoded change because otherwise, we would have sent the changes for the table which are not even part of this publication. I think you can try to create a separate table that is not part of the publication under test and see how the changes for that are filtered. I find that pgoutput_change() use a hash table RelationSyncCache to cache the publication info for tables. When we drop tables from the publication, the RelationSyncCache doesn't updated, so it replicate records. IIUC the logical replication only replicate the tables in publication, I think when the tables that aren't in publication should not be replicated. Attached the patch that fixes it. Thought? -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd. alter-publication-drop-table.patch Description: alter-publication-drop-table.patch
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Tue, 12 Jan 2021 at 14:38, Amit Kapila wrote: > On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy > wrote: >> >> On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila wrote: >> > >> > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy >> > wrote: >> > > >> > > Hi, >> > > >> > > While providing thoughts on the design in [1], I found a strange >> > > behaviour with the $subject. The use case is shown below as a sequence >> > > of steps that need to be run on publisher and subscriber to arrive at >> > > the strange behaviour. In step 5, the table is dropped from the >> > > publication and in step 6, the refresh publication is run on the >> > > subscriber, from here onwards, the expectation is that no further >> > > inserts into the publisher table have to be replicated on to the >> > > subscriber, but the opposite happens i.e. the inserts are still >> > > replicated to the subscriber. ISTM as a bug. Let me know if I'm >> > > missing anything. >> > > >> > >> > Did you try to investigate what's going on? Can you please check what >> > is the behavior if, after step-5, you restart the subscriber and >> > separately try creating a new subscription (maybe on a different >> > server) for that publication after step-5 and see if that allows the >> > relation to be replicated? AFAIU, in AlterSubscription_refresh, we >> > remove such dropped rels and stop their corresponding apply workers >> > which should stop the further replication of such relations but that >> > doesn't seem to be happening in your case. >> >> Here's my analysis: >> 1) in the publisher, alter publication drop table successfully >> removes(PublicationDropTables) the table from the catalogue >> pg_publication_rel >> 2) in the subscriber, alter subscription refresh publication >> successfully removes the table from the catalogue pg_subscription_rel >> (AlterSubscription_refresh->RemoveSubscriptionRel) >> so far so good >> > > Here, it should register the worker to stop on commit, and then on > commit it should call AtEOXact_ApplyLauncher to stop the apply worker. > Once the apply worker is stopped, the corresponding WALSender will > also be stopped. Something here is not happening as per expected > behavior. > >> 3) after the insertion into the table in the publisher(remember that >> it's dropped from the publication in (1)), the walsender process is >> unable detect that the table has been dropped from the publication >> i.e. it doesn't look at the pg_publication_rel catalogue or some >> other, but it only does is_publishable_relation() check which returns >> true in pgoutput_change(). Maybe the walsender should look at the >> catalogue pg_publication_rel in is_publishable_relation()? >> > > We must be somewhere checking pg_publication_rel before sending the > decoded change because otherwise, we would have sent the changes for > the table which are not even part of this publication. I think you can > try to create a separate table that is not part of the publication > under test and see how the changes for that are filtered. I find that pgoutput_change() use a hash table RelationSyncCache to cache the publication info for tables. When we drop tables from the publication, the RelationSyncCache doesn't updated, so it replicate records. -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy wrote: > > On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila wrote: > > > > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy > > wrote: > > > > > > Hi, > > > > > > While providing thoughts on the design in [1], I found a strange > > > behaviour with the $subject. The use case is shown below as a sequence > > > of steps that need to be run on publisher and subscriber to arrive at > > > the strange behaviour. In step 5, the table is dropped from the > > > publication and in step 6, the refresh publication is run on the > > > subscriber, from here onwards, the expectation is that no further > > > inserts into the publisher table have to be replicated on to the > > > subscriber, but the opposite happens i.e. the inserts are still > > > replicated to the subscriber. ISTM as a bug. Let me know if I'm > > > missing anything. > > > > > > > Did you try to investigate what's going on? Can you please check what > > is the behavior if, after step-5, you restart the subscriber and > > separately try creating a new subscription (maybe on a different > > server) for that publication after step-5 and see if that allows the > > relation to be replicated? AFAIU, in AlterSubscription_refresh, we > > remove such dropped rels and stop their corresponding apply workers > > which should stop the further replication of such relations but that > > doesn't seem to be happening in your case. > > Here's my analysis: > 1) in the publisher, alter publication drop table successfully > removes(PublicationDropTables) the table from the catalogue > pg_publication_rel > 2) in the subscriber, alter subscription refresh publication > successfully removes the table from the catalogue pg_subscription_rel > (AlterSubscription_refresh->RemoveSubscriptionRel) > so far so good > Here, it should register the worker to stop on commit, and then on commit it should call AtEOXact_ApplyLauncher to stop the apply worker. Once the apply worker is stopped, the corresponding WALSender will also be stopped. Something here is not happening as per expected behavior. > 3) after the insertion into the table in the publisher(remember that > it's dropped from the publication in (1)), the walsender process is > unable detect that the table has been dropped from the publication > i.e. it doesn't look at the pg_publication_rel catalogue or some > other, but it only does is_publishable_relation() check which returns > true in pgoutput_change(). Maybe the walsender should look at the > catalogue pg_publication_rel in is_publishable_relation()? > We must be somewhere checking pg_publication_rel before sending the decoded change because otherwise, we would have sent the changes for the table which are not even part of this publication. I think you can try to create a separate table that is not part of the publication under test and see how the changes for that are filtered. -- With Regards, Amit Kapila.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila wrote: > > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy > wrote: > > > > Hi, > > > > While providing thoughts on the design in [1], I found a strange > > behaviour with the $subject. The use case is shown below as a sequence > > of steps that need to be run on publisher and subscriber to arrive at > > the strange behaviour. In step 5, the table is dropped from the > > publication and in step 6, the refresh publication is run on the > > subscriber, from here onwards, the expectation is that no further > > inserts into the publisher table have to be replicated on to the > > subscriber, but the opposite happens i.e. the inserts are still > > replicated to the subscriber. ISTM as a bug. Let me know if I'm > > missing anything. > > > > Did you try to investigate what's going on? Can you please check what > is the behavior if, after step-5, you restart the subscriber and > separately try creating a new subscription (maybe on a different > server) for that publication after step-5 and see if that allows the > relation to be replicated? AFAIU, in AlterSubscription_refresh, we > remove such dropped rels and stop their corresponding apply workers > which should stop the further replication of such relations but that > doesn't seem to be happening in your case. Here's my analysis: 1) in the publisher, alter publication drop table successfully removes(PublicationDropTables) the table from the catalogue pg_publication_rel 2) in the subscriber, alter subscription refresh publication successfully removes the table from the catalogue pg_subscription_rel (AlterSubscription_refresh->RemoveSubscriptionRel) so far so good 3) after the insertion into the table in the publisher(remember that it's dropped from the publication in (1)), the walsender process is unable detect that the table has been dropped from the publication i.e. it doesn't look at the pg_publication_rel catalogue or some other, but it only does is_publishable_relation() check which returns true in pgoutput_change(). Maybe the walsender should look at the catalogue pg_publication_rel in is_publishable_relation()? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Tue, 12 Jan 2021 at 13:39, Amit Kapila wrote: > On Tue, Jan 12, 2021 at 9:58 AM japin wrote: >> >> >> On Tue, 12 Jan 2021 at 11:37, Amit Kapila wrote: >> > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy >> > wrote: >> >> >> >> Hi, >> >> >> >> While providing thoughts on the design in [1], I found a strange >> >> behaviour with the $subject. The use case is shown below as a sequence >> >> of steps that need to be run on publisher and subscriber to arrive at >> >> the strange behaviour. In step 5, the table is dropped from the >> >> publication and in step 6, the refresh publication is run on the >> >> subscriber, from here onwards, the expectation is that no further >> >> inserts into the publisher table have to be replicated on to the >> >> subscriber, but the opposite happens i.e. the inserts are still >> >> replicated to the subscriber. ISTM as a bug. Let me know if I'm >> >> missing anything. >> >> >> > >> > Did you try to investigate what's going on? Can you please check what >> > is the behavior if, after step-5, you restart the subscriber and >> > separately try creating a new subscription (maybe on a different >> > server) for that publication after step-5 and see if that allows the >> > relation to be replicated? AFAIU, in AlterSubscription_refresh, we >> > remove such dropped rels and stop their corresponding apply workers >> > which should stop the further replication of such relations but that >> > doesn't seem to be happening in your case. >> >> If we restart the subscriber after step-5, it will not replicate the records. >> >> As I said in [1], if we don't insert a new data in step-5, it will not >> replicate the records. >> > > Hmm, but in Bharath's test, it is replicating the Inserts in step-7 > and step-9 as well. Are you seeing something different? > Yes, however if we don't Inserts in step-5, the Inserts in step-7 and step-9 will not replicate. -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Tue, Jan 12, 2021 at 9:58 AM japin wrote: > > > On Tue, 12 Jan 2021 at 11:37, Amit Kapila wrote: > > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy > > wrote: > >> > >> Hi, > >> > >> While providing thoughts on the design in [1], I found a strange > >> behaviour with the $subject. The use case is shown below as a sequence > >> of steps that need to be run on publisher and subscriber to arrive at > >> the strange behaviour. In step 5, the table is dropped from the > >> publication and in step 6, the refresh publication is run on the > >> subscriber, from here onwards, the expectation is that no further > >> inserts into the publisher table have to be replicated on to the > >> subscriber, but the opposite happens i.e. the inserts are still > >> replicated to the subscriber. ISTM as a bug. Let me know if I'm > >> missing anything. > >> > > > > Did you try to investigate what's going on? Can you please check what > > is the behavior if, after step-5, you restart the subscriber and > > separately try creating a new subscription (maybe on a different > > server) for that publication after step-5 and see if that allows the > > relation to be replicated? AFAIU, in AlterSubscription_refresh, we > > remove such dropped rels and stop their corresponding apply workers > > which should stop the further replication of such relations but that > > doesn't seem to be happening in your case. > > If we restart the subscriber after step-5, it will not replicate the records. > > As I said in [1], if we don't insert a new data in step-5, it will not > replicate the records. > Hmm, but in Bharath's test, it is replicating the Inserts in step-7 and step-9 as well. Are you seeing something different? > In both cases, the AlterSubscription_refresh() call RemoveSubscriptionRel() > and logicalrep_worker_stop_at_commit(). However, if we insert a data in > step-5, it doesn't work as expected. Any thoughts? > I think the data inserted in step-5 might be visible because we have stopped the apply process after that but it is not clear why the data inserted in steps 7 and 9 is getting replicated. I think due to some reason apply worker is not getting stopped even after Refresh Publication statement is finished due to which the data is being replicated after that as well and after restart the apply worker won't be restarted so the data replication doesn't happen. -- With Regards, Amit Kapila.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Tue, 12 Jan 2021 at 11:37, Amit Kapila wrote: > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy > wrote: >> >> Hi, >> >> While providing thoughts on the design in [1], I found a strange >> behaviour with the $subject. The use case is shown below as a sequence >> of steps that need to be run on publisher and subscriber to arrive at >> the strange behaviour. In step 5, the table is dropped from the >> publication and in step 6, the refresh publication is run on the >> subscriber, from here onwards, the expectation is that no further >> inserts into the publisher table have to be replicated on to the >> subscriber, but the opposite happens i.e. the inserts are still >> replicated to the subscriber. ISTM as a bug. Let me know if I'm >> missing anything. >> > > Did you try to investigate what's going on? Can you please check what > is the behavior if, after step-5, you restart the subscriber and > separately try creating a new subscription (maybe on a different > server) for that publication after step-5 and see if that allows the > relation to be replicated? AFAIU, in AlterSubscription_refresh, we > remove such dropped rels and stop their corresponding apply workers > which should stop the further replication of such relations but that > doesn't seem to be happening in your case. If we restart the subscriber after step-5, it will not replicate the records. As I said in [1], if we don't insert a new data in step-5, it will not replicate the records. In both cases, the AlterSubscription_refresh() call RemoveSubscriptionRel() and logicalrep_worker_stop_at_commit(). However, if we insert a data in step-5, it doesn't work as expected. Any thoughts? [1] https://www.postgresql.org/message-id/a7a618fb-f87c-439c-90a3-93cf9e734...@hotmail.com -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy wrote: > > Hi, > > While providing thoughts on the design in [1], I found a strange > behaviour with the $subject. The use case is shown below as a sequence > of steps that need to be run on publisher and subscriber to arrive at > the strange behaviour. In step 5, the table is dropped from the > publication and in step 6, the refresh publication is run on the > subscriber, from here onwards, the expectation is that no further > inserts into the publisher table have to be replicated on to the > subscriber, but the opposite happens i.e. the inserts are still > replicated to the subscriber. ISTM as a bug. Let me know if I'm > missing anything. > Did you try to investigate what's going on? Can you please check what is the behavior if, after step-5, you restart the subscriber and separately try creating a new subscription (maybe on a different server) for that publication after step-5 and see if that allows the relation to be replicated? AFAIU, in AlterSubscription_refresh, we remove such dropped rels and stop their corresponding apply workers which should stop the further replication of such relations but that doesn't seem to be happening in your case. -- With Regards, Amit Kapila.