Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-24 Thread Amit Kapila
On Sat, Jan 23, 2021 at 1:44 PM Bharath Rupireddy
 wrote:
>
> On Sat, Jan 23, 2021 at 11:50 AM Amit Kapila  wrote:
> >
> > On Fri, Jan 22, 2021 at 2:59 PM Bharath Rupireddy
> >  wrote:
> > >
> > > On Fri, Jan 22, 2021 at 2:00 PM Amit Kapila  
> > > wrote:
> > >
> > > Yes you are right. Looks like the above commit is causing the issue. I
> > > reverted that commit and did not see the drop publication bug, see use
> > > case [1].
> > >
> >
> > Okay, thanks for confirming the same. I am planning to push the
> > attached where I made changes in few comments, combined the code and
> > test patch, and made a few changes in the commit message. Let me know
> > if you have any suggestions?
>
> Thanks Amit. Looks like there were some typos in the test case
> description, I corrected them.
>

Thanks, I have pushed the patch.

-- 
With Regards,
Amit Kapila.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-23 Thread Bharath Rupireddy
On Sat, Jan 23, 2021 at 11:50 AM Amit Kapila  wrote:
>
> On Fri, Jan 22, 2021 at 2:59 PM Bharath Rupireddy
>  wrote:
> >
> > On Fri, Jan 22, 2021 at 2:00 PM Amit Kapila  wrote:
> >
> > Yes you are right. Looks like the above commit is causing the issue. I
> > reverted that commit and did not see the drop publication bug, see use
> > case [1].
> >
>
> Okay, thanks for confirming the same. I am planning to push the
> attached where I made changes in few comments, combined the code and
> test patch, and made a few changes in the commit message. Let me know
> if you have any suggestions?

Thanks Amit. Looks like there were some typos in the test case
description, I corrected them.

+# Drop the table from publicaiton
+# Add the table to publicaiton again
+# Rerfresh publication after table is added to publication

Attaching v7 patch. I'm fine with it.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v7-0001-Fix-ALTER-PUBLICATION.DROP-TABLE-behavior.patch
Description: Binary data


Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-22 Thread Amit Kapila
On Fri, Jan 22, 2021 at 2:59 PM Bharath Rupireddy
 wrote:
>
> On Fri, Jan 22, 2021 at 2:00 PM Amit Kapila  wrote:
>
> Yes you are right. Looks like the above commit is causing the issue. I
> reverted that commit and did not see the drop publication bug, see use
> case [1].
>

Okay, thanks for confirming the same. I am planning to push the
attached where I made changes in few comments, combined the code and
test patch, and made a few changes in the commit message. Let me know
if you have any suggestions?

-- 
With Regards,
Amit Kapila.


v6-0001-Fix-ALTER-PUBLICATION.DROP-TABLE-behavior.patch
Description: Binary data


Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-22 Thread Bharath Rupireddy
On Fri, Jan 22, 2021 at 2:59 PM Bharath Rupireddy
 wrote:
> > BTW, have we tried to check if this problem exists in back-branches?
> > It seems to me the problem has been recently introduced by commit
> > 69bd60672a. I am telling this by looking at code and have yet not
> > performed any testing so I could be wrong.
>
> Yes you are right. Looks like the above commit is causing the issue. I
> reverted that commit and did not see the drop publication bug, see use
> case [1].

I think, to be precise the issue is because of not setting pubactions
to false when if (!entry->replicate_valid) is true by the commit
69bd60672. In fact, it removed the correct check. Since found(denotes
whether the cache entry exists or not) will be true even after the
alter publication ... drop table, the cache entry->replicate_valid is
the right check which will be set to false in
rel_sync_cache_publication_cb. But this commit removed it.

And also the reason for not catching this bug during the testing of
69bd60672 commit's patch is that we don't have a test case for alter
publication ... drop table behaviour.

-   if (!found || !entry->replicate_valid)
+   if (!found)
+   {
+   /* immediately make a new entry valid enough to
satisfy callbacks */
+   entry->schema_sent = false;
+   entry->streamed_txns = NIL;
+   entry->replicate_valid = false;
+   entry->pubactions.pubinsert = entry->pubactions.pubupdate =
+   entry->pubactions.pubdelete =
entry->pubactions.pubtruncate = false;
+   entry->publish_as_relid = InvalidOid;
+   }
+
+   /* Validate the entry */
+   if (!entry->replicate_valid)
{
List   *pubids = GetRelationPublications(relid);
ListCell   *lc;
@@ -977,9 +987,6 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
 * relcache considers all publications given relation
is in, but here
 * we only need to consider ones that the subscriber requested.
 */
-   entry->pubactions.pubinsert = entry->pubactions.pubupdate =
-   entry->pubactions.pubdelete =
entry->pubactions.pubtruncate = false;
-

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-22 Thread Bharath Rupireddy
On Fri, Jan 22, 2021 at 2:00 PM Amit Kapila  wrote:
> Thanks for the patch. Few comments:
> +
> +# Test replication with multiple publications for subscription
> +
>
> While checking the existing tests in 001_rep_changes.pl, I came across
> the below test which has multiple publications, won't that suffice the
> need?
>
> "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr'
> PUBLICATION tap_pub, tap_pub_ins_only"

Yeah I saw that test case earlier, but we couldn't catch the error with it.

Our earlier fix was to set pubactions to false whenever publish is
false in get_rel_sync_entry

foreach(lc, data->publications)
{
//publish is set to false when the given relation's publication is
not in the list of publications provided by subscription i.e.
data->publications
if(!publish)
{
//set all entry->pubactions to false
}
}

The existing use case:
The publication tap_pub has tables tab_rep, tab_full, tab_full2,
tab_mixed, tab_include, tab_nothing, tab_full_pk
The publication tap_pub_ins_only has table tab_ins
CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr'
PUBLICATION tap_pub, tap_pub_ins_only;
INSERT INTO tab_ins SELECT generate_series(1,50);
And the first insertion after the subscription is created happens into
the table tab_ins which is with tap_pub_ins_only i.e. 2nd publication
in the list of publications specified in the create subscription
statement.

The existing use case analysis:
given relation tab_ins's publication is tap_pub_ins_only
data->publications list contains - tap_pub, tap_pub_ins_only
loop 1: data->publications list 1st element is tap_pub and relation
tab_ins's publication is tap_pub_ins_only, so publish is false,
pubations are set to false.
loop 2: data->publications list 2nd element is tap_pub_ins_only and
relation tab_ins's publication is also tap_pub_ins_only, publish is
true, pubations are set to true,
loop is exited with entry->pubactions set to true, hence the
replication happens for the table.

japin's use case [1]:
But, in the japin use case[1] where we saw regression with the initial
version of the fix, the first insertion happens into the table t1
which is with publication mypub1 i.e. 1st in the list of publications
associated with the subscription.

japin's use case [1] analysis:
given relation t1's publication is mypub1
data->publications list contains - mypub1, mypub2
loop 1: data->publications list 1st element is mypub1 and relation
t1's publication is also mypub1, so publish is true, pubations.insert
is set to true, other actions are false because the publication mypub1
is created WITH (publish='insert');, so the loop can't get exited from
below break condition.
if (entry->pubactions.pubinsert && entry->pubactions.pubupdate &&
entry->pubactions.pubdelete && entry->pubactions.pubtruncate)
break;
loop 2: data->publications list 2nd element is mypub2 and relation
t1's publication is mypub1, so publish is false, all pubations are set
to false.
loop is exited with all entry->pubactions set to false, hence the
replication doesn't happen for the table t1.

>From the above analysis, it turns out that the existing use case
didn't catch the regression, that is why we had to add a new test
case.

[1]
Step (1)
-- On publisher
CREATE TABLE t1 (a int);
CREATE TABLE t2 (a int);
CREATE PUBLICATION mypub1 FOR TABLE t1 WITH (publish='insert');
CREATE PUBLICATION mypub2 FOR TABLE t2;

Step (2)
-- On subscriber
CREATE TABLE t1 (a int);
CREATE TABLE t2 (a int);
CREATE SUBSCRIPTION mysub1 CONNECTION 'host=localhost port=5432
dbname=postgres' PUBLICATION mypub1, mypub2;

Step (3)
-- On publisher
INSERT INTO t1 VALUES (1);

Step (4)
-- On subscriber
SELECT * FROM t1;

> BTW, have we tried to check if this problem exists in back-branches?
> It seems to me the problem has been recently introduced by commit
> 69bd60672a. I am telling this by looking at code and have yet not
> performed any testing so I could be wrong.

Yes you are right. Looks like the above commit is causing the issue. I
reverted that commit and did not see the drop publication bug, see use
case [1].

[1]
-- On publisher
CREATE TABLE t1 (a int);
INSERT INTO t1 VALUES (1);
CREATE PUBLICATION mypub1 FOR TABLE t1;

-- On subscriber
CREATE TABLE t1 (a int);
CREATE SUBSCRIPTION mysub1 CONNECTION 'host=localhost port=5432
dbname=postgres' PUBLICATION mypub1;

-- On publisher
INSERT INTO t1 VALUES (2);

-- On subscriber
SELECT * FROM t1;

-- On publisher
ALTER PUBLICATION mypub1 DROP TABLE t1;
INSERT INTO t1 VALUES (3);

-- On subscriber
SELECT * FROM t1;   --> we should not see value 3, if the alter
publication ... drop table worked correctly.

-- On publisher
INSERT INTO t1 SELECT * FROM generate_series(4, 88);

-- On subscriber
SELECT * FROM t1;

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-22 Thread Bharath Rupireddy
On Fri, Jan 22, 2021 at 2:00 PM Amit Kapila  wrote:
>
> On Sat, Jan 16, 2021 at 6:08 PM Bharath Rupireddy
>  wrote:
> >
> > On Sat, Jan 16, 2021 at 12:21 PM Bharath Rupireddy
> >  wrote:
> > > > In the test, can we have multiple publications for the subscription
> > > > because that is how we discovered that the originally proposed patch
> > > > was not correct? Also, is it possible to extend some of the existing
> > > > tests in 001_rep_changes.pl or anywhere so that we can avoid some cost
> > > > of the replication setup.
> > >
> > > Sure, I will add the multiple publications use case provided by japin
> > > and also see if I can move the tests from 100_bugs.pl to
> > > 0001_rep_changes.pl.
> >
> > Attaching v5 patch set. 0001 - No change. 0002 - I moved the tests to
> > 001_rep_changes.pl so that we could avoid creation of subscriber,
> > publisher nodes and other set up.
> >
>
> Thanks for the patch. Few comments:
>
> +
> +# Test replication with multiple publications for subscription
> +
>
> While checking the existing tests in 001_rep_changes.pl, I came across
> the below test which has multiple publications, won't that suffice the
> need?
>
> "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr'
> PUBLICATION tap_pub, tap_pub_ins_only"
>
> BTW, have we tried to check if this problem exists in back-branches?
> It seems to me the problem has been recently introduced by commit
> 69bd60672a. I am telling this by looking at code and have yet not
> performed any testing so I could be wrong.

Thanks for the comments Amit. I will update soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-22 Thread Amit Kapila
On Sat, Jan 16, 2021 at 6:08 PM Bharath Rupireddy
 wrote:
>
> On Sat, Jan 16, 2021 at 12:21 PM Bharath Rupireddy
>  wrote:
> > > In the test, can we have multiple publications for the subscription
> > > because that is how we discovered that the originally proposed patch
> > > was not correct? Also, is it possible to extend some of the existing
> > > tests in 001_rep_changes.pl or anywhere so that we can avoid some cost
> > > of the replication setup.
> >
> > Sure, I will add the multiple publications use case provided by japin
> > and also see if I can move the tests from 100_bugs.pl to
> > 0001_rep_changes.pl.
>
> Attaching v5 patch set. 0001 - No change. 0002 - I moved the tests to
> 001_rep_changes.pl so that we could avoid creation of subscriber,
> publisher nodes and other set up.
>

Thanks for the patch. Few comments:

+
+# Test replication with multiple publications for subscription
+

While checking the existing tests in 001_rep_changes.pl, I came across
the below test which has multiple publications, won't that suffice the
need?

"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr'
PUBLICATION tap_pub, tap_pub_ins_only"

BTW, have we tried to check if this problem exists in back-branches?
It seems to me the problem has been recently introduced by commit
69bd60672a. I am telling this by looking at code and have yet not
performed any testing so I could be wrong.

-- 
With Regards,
Amit Kapila.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-16 Thread Bharath Rupireddy
On Sat, Jan 16, 2021 at 12:21 PM Bharath Rupireddy
 wrote:
> > In the test, can we have multiple publications for the subscription
> > because that is how we discovered that the originally proposed patch
> > was not correct? Also, is it possible to extend some of the existing
> > tests in 001_rep_changes.pl or anywhere so that we can avoid some cost
> > of the replication setup.
>
> Sure, I will add the multiple publications use case provided by japin
> and also see if I can move the tests from 100_bugs.pl to
> 0001_rep_changes.pl.

Attaching v5 patch set. 0001 - No change. 0002 - I moved the tests to
001_rep_changes.pl so that we could avoid creation of subscriber,
publisher nodes and other set up. I also added the multiple
publication for the subscription test case which was failing with
v2-0001 patch. Note that I had to create subscriber,  publications for
this test case, because I couldn't find the regression (on v2-001
patch) with any of the existing test cases and also I'm dropping them
after the test so that it will not harm any existing following tests.
Hope that's okay. With v5-0002 and v2-0001, the multiple publication
for the subscription test case fails.

Please consider the v5 patch set for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From 7dc6f91ca7225119b4fe6b3f0869385519721d2c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Sat, 16 Jan 2021 11:30:09 +0530
Subject: [PATCH v5 1/1] Fix ALTER PUBLICATION...DROP TABLE behaviour

Currently, in logical replication, publisher/walsender publishes
the tables even though they aren't part of the publication i.e
they are dropped from the publication. Because of this ALTER
PUBLICATION...DROP TABLE doesn't work as expected.

To fix above issue, when the entry got invalidated in
rel_sync_cache_publication_cb(), mark the pubactions to false
and let get_rel_sync_entry() recalculate the pubactions.
---
 src/backend/replication/pgoutput/pgoutput.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 2f01137b42..478cd1f9f5 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1179,5 +1179,18 @@ rel_sync_cache_publication_cb(Datum arg, int cacheid, uint32 hashvalue)
 	 */
 	hash_seq_init(&status, RelationSyncCache);
 	while ((entry = (RelationSyncEntry *) hash_seq_search(&status)) != NULL)
+	{
 		entry->replicate_valid = false;
+
+		/*
+		 * There might be some relations dropped from the publication, we do
+		 * not need to publish the changes for them. Since we cannot get the
+		 * the dropped relations (see above), we reset pubactions for all
+		 * entries.
+		 */
+		entry->pubactions.pubinsert = false;
+		entry->pubactions.pubupdate = false;
+		entry->pubactions.pubdelete = false;
+		entry->pubactions.pubtruncate = false;
+	}
 }
-- 
2.25.1

From 95f4855a1f7ecaa8e2897a154e9347945770e76c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Sat, 16 Jan 2021 17:13:06 +0530
Subject: [PATCH v5 2/2] Test behaviour of ALTER PUBLICATION ... DROP TABLE

---
 src/test/subscription/t/001_rep_changes.pl | 95 +-
 1 file changed, 94 insertions(+), 1 deletion(-)

diff --git a/src/test/subscription/t/001_rep_changes.pl b/src/test/subscription/t/001_rep_changes.pl
index 0680f44a1a..215ccf40c9 100644
--- a/src/test/subscription/t/001_rep_changes.pl
+++ b/src/test/subscription/t/001_rep_changes.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 23;
+use Test::More tests => 27;
 
 # Initialize publisher node
 my $node_publisher = get_new_node('publisher');
@@ -153,6 +153,99 @@ is($result, qq(20|-20|-1),
 $node_publisher->safe_psql('postgres',
 	"INSERT INTO tab_full SELECT generate_series(1,10)");
 
+# Test behaviour of ALTER PUBLICATION ... DROP TABLE
+
+# When publisher drops a table from publication, it should also stop sending
+# its changes to subscriber. We look at the subscriber whether it receives
+# the row that is inserted to the table in the publisher after it is dropped
+# from the publication.
+
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*), min(a), max(a) FROM tab_ins");
+is($result, qq(1052|1|1002), 'check rows on subscriber before table drop from publication');
+
+# Drop the table from publicaiton
+$node_publisher->safe_psql('postgres',
+	"ALTER PUBLICATION tap_pub_ins_only DROP TABLE tab_ins");
+
+# Insert a row in publisher, but publisher will not send this row to subscriber
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_ins VALUES()");
+
+$node_publisher->wait_for_catchup('tap_sub');
+
+# Subscriber will not receive the inserted row, after table is dropped from
+# publication, so row count should remain the same.
+$result = $node_subscriber->safe_psql('postgres',
+	"SELECT count(*), min(a), max(a) FROM tab_ins");
+is($result, qq(10

Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-15 Thread Bharath Rupireddy
On Sat, Jan 16, 2021 at 12:02 PM Amit Kapila  wrote:
>
> On Sat, Jan 16, 2021 at 11:52 AM Bharath Rupireddy
>  wrote:
> >
> > I made an entry in the commitfest[1], so that the patches will get a
> > chance to run on all the platforms.
> >
> > Attaching v4 patch set i.e. 0001 - fix, 0002 - test case.
> >
>
> In the test, can we have multiple publications for the subscription
> because that is how we discovered that the originally proposed patch
> was not correct? Also, is it possible to extend some of the existing
> tests in 001_rep_changes.pl or anywhere so that we can avoid some cost
> of the replication setup.

Sure, I will add the multiple publications use case provided by japin
and also see if I can move the tests from 100_bugs.pl to
0001_rep_changes.pl.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-15 Thread japin


On Sat, 16 Jan 2021 at 14:21, Bharath Rupireddy 
 wrote:
> I made an entry in the commitfest[1], so that the patches will get a
> chance to run on all the platforms.
>
> Attaching v4 patch set i.e. 0001 - fix, 0002 - test case.
>
> [1] - https://commitfest.postgresql.org/32/2944/
>

Thanks for the updated patch.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-15 Thread Amit Kapila
On Sat, Jan 16, 2021 at 11:52 AM Bharath Rupireddy
 wrote:
>
> I made an entry in the commitfest[1], so that the patches will get a
> chance to run on all the platforms.
>
> Attaching v4 patch set i.e. 0001 - fix, 0002 - test case.
>

In the test, can we have multiple publications for the subscription
because that is how we discovered that the originally proposed patch
was not correct? Also, is it possible to extend some of the existing
tests in 001_rep_changes.pl or anywhere so that we can avoid some cost
of the replication setup.

-- 
With Regards,
Amit Kapila.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-15 Thread Bharath Rupireddy
I made an entry in the commitfest[1], so that the patches will get a
chance to run on all the platforms.

Attaching v4 patch set i.e. 0001 - fix, 0002 - test case.

[1] - https://commitfest.postgresql.org/32/2944/

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v4-0001-Fix-ALTER-PUBLICATION.DROP-TABLE-behaviour.patch
Description: Binary data


v4-0002-Test-behaviour-of-ALTER-PUBLICATION-.-DROP-TABLE.patch
Description: Binary data


Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-15 Thread Bharath Rupireddy
On Fri, Jan 15, 2021 at 4:54 PM japin  wrote:
> On Fri, 15 Jan 2021 at 18:57, Bharath Rupireddy 
>  wrote:
> > On Fri, Jan 15, 2021 at 4:03 PM Amit Kapila  wrote:
> >> That sounds like a better way to fix and in fact, I was about to
> >> suggest the same after reading your previous email. I'll think more on
> >> this but in the meantime, can you add the test case in the patch as
> >> requested earlier as well.
> >
> > @Li Japin Please let me know if you have already started to work on
> > the test case, otherwise I can make a 0002 patch for the test case and
> > post.
> >
>
> Yeah, I'm working on the test case.  Since I'm not familair with the logical
> replication test, it may take some times.

Thanks a lot. I quickly added the initial use case where we saw the
bug, attached is 0002 patch. Please have a look and add further use
cases if necessary. If okay, it's better to make a cf entry.

I have one comment in v3-0001 patch,
+ * There might some relations dropped from the publication, we do

I think we should change it to - "There might be some relations".

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v3-0002-Test-behaviour-of-ALTER-PUBLICATION-.-DROP-TABLE.patch
Description: Binary data


Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-15 Thread japin


On Fri, 15 Jan 2021 at 18:57, Bharath Rupireddy 
 wrote:
> On Fri, Jan 15, 2021 at 4:03 PM Amit Kapila  wrote:
>> That sounds like a better way to fix and in fact, I was about to
>> suggest the same after reading your previous email. I'll think more on
>> this but in the meantime, can you add the test case in the patch as
>> requested earlier as well.
>
> @Li Japin Please let me know if you have already started to work on
> the test case, otherwise I can make a 0002 patch for the test case and
> post.
>

Yeah, I'm working on the test case.  Since I'm not familair with the logical
replication test, it may take some times.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-15 Thread Bharath Rupireddy
On Fri, Jan 15, 2021 at 4:03 PM Amit Kapila  wrote:
> That sounds like a better way to fix and in fact, I was about to
> suggest the same after reading your previous email. I'll think more on
> this but in the meantime, can you add the test case in the patch as
> requested earlier as well.

@Li Japin Please let me know if you have already started to work on
the test case, otherwise I can make a 0002 patch for the test case and
post.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-15 Thread Bharath Rupireddy
On Fri, Jan 15, 2021 at 4:20 PM Amit Kapila  wrote:
> I feel it is better to not do anything for this because neither we
> have a test to reproduce the problem nor is it clear from theory if
> there is any problem to solve here.

+1 to ignore 0002 patch. Thanks Amit.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-15 Thread Amit Kapila
On Fri, Jan 15, 2021 at 11:48 AM Bharath Rupireddy
 wrote:
>
> On Thu, Jan 14, 2021 at 10:53 AM Amit Kapila  wrote:
> > > 0002 - Invalidates the relation map cache in subscriber syscache
> > > invalidation callbacks. Currently, I'm setting entry->state to
> > > SUBREL_STATE_UNKNOWN in the new invalidation function that's
> > > introduced logicalrep_relmap_invalidate, so that in the next call to
> > > logicalrep_rel_open the entry's state will be read from the system
> > > catalogues using GetSubscriptionRelState. If this is sound's bit
> > > strange, I can add a boolean invalid to LogicalRepRelMapEntry
> > > structure and set that here and in logicalrep_rel_open, I can have
> > > something like:
> > > if (entry->state != SUBREL_STATE_READY || entry->invalid)
> > > entry->state = GetSubscriptionRelState(MySubscription->oid,
> > >entry->localreloid,
> > >&entry->statelsn);
> > >
> > >if (entry->invalid)
> > > entry->invalid = false;
> > >
> >
> > So, the point of the second patch is that it good to have such a
> > thing, otherwise, we don't see any problem if we just use patch-0001,
> > right? I think if we fix the first-one, automatically, we will achieve
> > what we are trying to with the second-one because ultimately
> > logicalrep_relmap_update will be called due to Alter Pub... Drop table
> > .. step and it will mark the entry as SUBREL_STATE_UNKNOWN.
>
> On some further analysis, I found that logicalrep_relmap_update is
> called in subscribers for inserts, delets, updates and truncate
> statements on the dropped(from publication) relations in the
> publisher.
>
> If any alters to pg_subscription, then we invalidate the subscription
> in subscription_change_cb, maybe_reread_subscription subscription will
> take care of re-reading from the system catalogues, see
> apply_handle_->ensure_transaction -> maybe_reread_subscription.
> And we don't have any entry in LogicalRepRelMapEntry that gets
> affected because of changes to pg_subscription, so we don't need to
> invalidate the rel map cache entries in subscription_change_cb.
>
> If any alters to pg_subscription_rel, then there are two parameters in
> LogicalRepRelMapEntry structure, state and statelsn that may get
> affected. Changes to statelsn column is taken care of with the
> invalidation callback invalidate_syncing_table_states setting
> table_states_valid to true and
> process_syncing_tables->process_syncing_tables_for_apply.
>

I think you are saying the reverse here. The function
invalidate_syncing_table_states sets table_states_valid to false and
the other one sets it to true.

> But, if
> state column is changed somehow (although I'm not quite sure how it
> can change to different states 'i', 'd', 's', 'r' after the initial
> table sync phase in logical replication,
>

These states are for the initial copy-phase after that these won't change.

> but we can pretty much do
> something like update pg_subscription_rel set srsubstate = 'd';), in
> this case invalidate_syncing_table_states gets called, but if we don't
> revalidation of relation map cache entries, they will have the old
> state value.
>

Hmm, modifying state values as you are suggesting have much dire
consequences like in this case it can let tablesync worker to restart
and try to do perform initial-copy again or it can lead to missing
data in subscriber. We don't recommend to change system catalogs
manually due to such problems.

> So what I feel is at least we need the 0002 patch but with only
> invalidating the entries in invalidate_syncing_table_states not in
> subscription_change_cb, although I'm not able to back it up with any
> use case or bug as such.
>

I feel it is better to not do anything for this because neither we
have a test to reproduce the problem nor is it clear from theory if
there is any problem to solve here.

-- 
With Regards,
Amit Kapila.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-15 Thread Amit Kapila
On Fri, Jan 15, 2021 at 2:46 PM japin  wrote:
>
> >
> > I'm sorry, the 0001 patch is totally wrong.  If we have only one 
> > publication, it works,
> > however, this is a special case.  If we have multiple publication in one 
> > subscription,
> > it doesn't work.  Here is a usecase.
> >
> > Step (1)
> > -- On publisher
> > CREATE TABLE t1 (a int);
> > CREATE TABLE t2 (a int);
> > CREATE PUBLICATION mypub1 FOR TABLE t1 WITH (publish='insert');
> > CREATE PUBLICATION mypub2 FOR TABLE t2;
> >
> > Step (2)
> > -- On subscriber
> > CREATE TABLE t1 (a int);
> > CREATE TABLE t2 (a int);
> > CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432 
> > dbname=postgres' PUBLICATION mypub1, mypub2;
> >
> > Step (3)
> > -- On publisher
> > INSERT INTO t1 VALUES (1);
> >
> > Step (4)
> > -- On subscriber
> > SELECT * FROM t1;
> >
> > In step (4), we cannot get the records, because there has two publications 
> > in
> > data->publications, so we will check one by one.
> > The first publication is "mypub1" and entry->pubactions.pubinsert will be 
> > set
> > true, however, in the second round, the publication is "mypub2", and we 
> > cannot
> > find pub->oid in pubids (only oid of "mypub1"), so it will set all 
> > entry->pubactions
> > to false, and nothing will be replicate to subscriber.
> >
> > My apologies!
>
> As I mentioned in previous thread, if there has multiple publications in
> single subscription, it might lead data loss.
>
> When the entry got invalidated in rel_sync_cache_publication_cb(), I think we
> should mark the pubactions to false, and let get_rel_sync_entry() recalculate
> the pubactions.
>
> 0001 - modify as above described.
> 0002 - do not change.
>
> Any thought?
>

That sounds like a better way to fix and in fact, I was about to
suggest the same after reading your previous email. I'll think more on
this but in the meantime, can you add the test case in the patch as
requested earlier as well.

-- 
With Regards,
Amit Kapila.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-15 Thread japin

On Fri, 15 Jan 2021 at 15:49, japin  wrote:
> On Fri, 15 Jan 2021 at 14:50, Bharath Rupireddy 
>  wrote:
>> On Fri, Jan 15, 2021 at 11:33 AM Hou, Zhijie  
>> wrote:
>>>
>>> > On Thu, Jan 14, 2021 at 5:36 PM Li Japin  wrote
>>> > > Do we really need to access PUBLICATIONRELMAP in this patch? What if
>>> > > we just set it to false in the else condition of (if (publish &&
>>> > > (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))
>>> > >
>>> > > Thank for you review. I agree with you, it doesn’t need to access
>>> > > PUBLICATIONRELMAP, since We already get the publication oid in
>>> > > GetRelationPublications(relid) [1], which also access
>>> > > PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the
>>> > publish is false, so we do not need publish the table.
>>> >
>>> > +1. This is enough.
>>> >
>>> > > I have another question, the data->publications is a list, when did it
>>> > has more then one items?
>>> >
>>> > IIUC, when the single table is associated with multiple publications, then
>>> > data->publications will have multiple entries. Though I have not tried,
>>> > we can try having two or three publications for the same table and verify
>>> > that.
>>> >
>>> > > 0001 - change as you suggest.
>>> > > 0002 - does not change.
>>>
>>>
>>> Hi
>>>
>>> In 0001 patch
>>>
>>> The comments says " Relation is not associated with the publication anymore 
>>> "
>>> That comment was accurate when check is_relation_part_of_publication() in 
>>> the last patch.
>>>
>>> But when put the code in the else branch, not only the case in the 
>>> comments(Relation is dropped),
>>> Some other case such as "partitioned tables" will hit else branch too.
>>>
>>> Do you think we need fix the comment a little to make it accurate?
>>
>> Thanks, that comment needs a rework, in fact the else condition
>> also(because we may fall into else branch even when publish is true
>> and (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot) is
>> false, but our intention(to fix the bug reported here) is to have the
>> else condition only when publish is false. And also instead of just
>> relying on the publish variable which can get set even when the
>> relation is not in the publication but ancestor_published is true, we
>> can just have something like below to fix the bug reported here:
>>
>> if (publish &&
>> (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))
>> {
>> entry->pubactions.pubinsert |= pub->pubactions.pubinsert;
>> entry->pubactions.pubupdate |= pub->pubactions.pubupdate;
>> entry->pubactions.pubdelete |= pub->pubactions.pubdelete;
>> entry->pubactions.pubtruncate |= pub->pubactions.pubtruncate;
>> }
>> else if (!list_member_oid(pubids, pub->oid))
>> {
>> /*
>>  * Relation is not associated with the publication anymore 
>> i.e
>>  * it would have been dropped from the publication. So we 
>> don't
>>  * need to publish the changes for it.
>>  */
>> entry->pubactions.pubinsert = false;
>> entry->pubactions.pubupdate = false;
>> entry->pubactions.pubdelete = false;
>> entry->pubactions.pubtruncate = false;
>> }
>>
>> So this way, the fix only focuses on what we have reported here and it
>> doesn't cause any regressions may be, and the existing comment becomes
>> appropriate.
>>
>> Thoughts?
>>
>
> I'm sorry, the 0001 patch is totally wrong.  If we have only one publication, 
> it works,
> however, this is a special case.  If we have multiple publication in one 
> subscription,
> it doesn't work.  Here is a usecase.
>
> Step (1)
> -- On publisher
> CREATE TABLE t1 (a int);
> CREATE TABLE t2 (a int);
> CREATE PUBLICATION mypub1 FOR TABLE t1 WITH (publish='insert');
> CREATE PUBLICATION mypub2 FOR TABLE t2;
>
> Step (2)
> -- On subscriber
> CREATE TABLE t1 (a int);
> CREATE TABLE t2 (a int);
> CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432 
> dbname=postgres' PUBLICATION mypub1, mypub2;
>
> Step (3)
> -- On publisher
> INSERT INTO t1 VALUES (1);
>
> Step (4)
> -- On subscriber
> SELECT * FROM t1;
>
> In step (4), we cannot get the records, because there has two publications in
> data->publications, so we will check one by one.
> The first publication is "mypub1" and entry->pubactions.pubinsert will be set
> true, however, in the second round, the publication is "mypub2", and we cannot
> find pub->oid in pubids (only oid of "mypub1"), so it will set all 
> entry->pubactions
> to false, and nothing will be replicate to subscriber.
>
> My apologies!

As I mentioned in previous thread, if there has multiple publications in
single subscription, it might lead data loss.

When the entry got invalidated in rel_sync_cache_publication_cb(), I think we
should mark the pubactions to false, and let get

Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-14 Thread japin


On Fri, 15 Jan 2021 at 14:50, Bharath Rupireddy 
 wrote:
> On Fri, Jan 15, 2021 at 11:33 AM Hou, Zhijie  
> wrote:
>>
>> > On Thu, Jan 14, 2021 at 5:36 PM Li Japin  wrote
>> > > Do we really need to access PUBLICATIONRELMAP in this patch? What if
>> > > we just set it to false in the else condition of (if (publish &&
>> > > (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))
>> > >
>> > > Thank for you review. I agree with you, it doesn’t need to access
>> > > PUBLICATIONRELMAP, since We already get the publication oid in
>> > > GetRelationPublications(relid) [1], which also access
>> > > PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the
>> > publish is false, so we do not need publish the table.
>> >
>> > +1. This is enough.
>> >
>> > > I have another question, the data->publications is a list, when did it
>> > has more then one items?
>> >
>> > IIUC, when the single table is associated with multiple publications, then
>> > data->publications will have multiple entries. Though I have not tried,
>> > we can try having two or three publications for the same table and verify
>> > that.
>> >
>> > > 0001 - change as you suggest.
>> > > 0002 - does not change.
>>
>>
>> Hi
>>
>> In 0001 patch
>>
>> The comments says " Relation is not associated with the publication anymore "
>> That comment was accurate when check is_relation_part_of_publication() in 
>> the last patch.
>>
>> But when put the code in the else branch, not only the case in the 
>> comments(Relation is dropped),
>> Some other case such as "partitioned tables" will hit else branch too.
>>
>> Do you think we need fix the comment a little to make it accurate?
>
> Thanks, that comment needs a rework, in fact the else condition
> also(because we may fall into else branch even when publish is true
> and (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot) is
> false, but our intention(to fix the bug reported here) is to have the
> else condition only when publish is false. And also instead of just
> relying on the publish variable which can get set even when the
> relation is not in the publication but ancestor_published is true, we
> can just have something like below to fix the bug reported here:
>
> if (publish &&
> (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))
> {
> entry->pubactions.pubinsert |= pub->pubactions.pubinsert;
> entry->pubactions.pubupdate |= pub->pubactions.pubupdate;
> entry->pubactions.pubdelete |= pub->pubactions.pubdelete;
> entry->pubactions.pubtruncate |= pub->pubactions.pubtruncate;
> }
> else if (!list_member_oid(pubids, pub->oid))
> {
> /*
>  * Relation is not associated with the publication anymore i.e
>  * it would have been dropped from the publication. So we 
> don't
>  * need to publish the changes for it.
>  */
> entry->pubactions.pubinsert = false;
> entry->pubactions.pubupdate = false;
> entry->pubactions.pubdelete = false;
> entry->pubactions.pubtruncate = false;
> }
>
> So this way, the fix only focuses on what we have reported here and it
> doesn't cause any regressions may be, and the existing comment becomes
> appropriate.
>
> Thoughts?
>

I'm sorry, the 0001 patch is totally wrong.  If we have only one publication, 
it works,
however, this is a special case.  If we have multiple publication in one 
subscription,
it doesn't work.  Here is a usecase.

Step (1)
-- On publisher
CREATE TABLE t1 (a int);
CREATE TABLE t2 (a int);
CREATE PUBLICATION mypub1 FOR TABLE t1 WITH (publish='insert');
CREATE PUBLICATION mypub2 FOR TABLE t2;

Step (2)
-- On subscriber
CREATE TABLE t1 (a int);
CREATE TABLE t2 (a int);
CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432 dbname=postgres' 
PUBLICATION mypub1, mypub2;

Step (3)
-- On publisher
INSERT INTO t1 VALUES (1);

Step (4)
-- On subscriber
SELECT * FROM t1;

In step (4), we cannot get the records, because there has two publications in
data->publications, so we will check one by one.
The first publication is "mypub1" and entry->pubactions.pubinsert will be set
true, however, in the second round, the publication is "mypub2", and we cannot
find pub->oid in pubids (only oid of "mypub1"), so it will set all 
entry->pubactions
to false, and nothing will be replicate to subscriber.

My apologies!

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-14 Thread Bharath Rupireddy
On Fri, Jan 15, 2021 at 11:33 AM Hou, Zhijie  wrote:
>
> > On Thu, Jan 14, 2021 at 5:36 PM Li Japin  wrote
> > > Do we really need to access PUBLICATIONRELMAP in this patch? What if
> > > we just set it to false in the else condition of (if (publish &&
> > > (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))
> > >
> > > Thank for you review. I agree with you, it doesn’t need to access
> > > PUBLICATIONRELMAP, since We already get the publication oid in
> > > GetRelationPublications(relid) [1], which also access
> > > PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the
> > publish is false, so we do not need publish the table.
> >
> > +1. This is enough.
> >
> > > I have another question, the data->publications is a list, when did it
> > has more then one items?
> >
> > IIUC, when the single table is associated with multiple publications, then
> > data->publications will have multiple entries. Though I have not tried,
> > we can try having two or three publications for the same table and verify
> > that.
> >
> > > 0001 - change as you suggest.
> > > 0002 - does not change.
>
>
> Hi
>
> In 0001 patch
>
> The comments says " Relation is not associated with the publication anymore "
> That comment was accurate when check is_relation_part_of_publication() in the 
> last patch.
>
> But when put the code in the else branch, not only the case in the 
> comments(Relation is dropped),
> Some other case such as "partitioned tables" will hit else branch too.
>
> Do you think we need fix the comment a little to make it accurate?

Thanks, that comment needs a rework, in fact the else condition
also(because we may fall into else branch even when publish is true
and (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot) is
false, but our intention(to fix the bug reported here) is to have the
else condition only when publish is false. And also instead of just
relying on the publish variable which can get set even when the
relation is not in the publication but ancestor_published is true, we
can just have something like below to fix the bug reported here:

if (publish &&
(relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))
{
entry->pubactions.pubinsert |= pub->pubactions.pubinsert;
entry->pubactions.pubupdate |= pub->pubactions.pubupdate;
entry->pubactions.pubdelete |= pub->pubactions.pubdelete;
entry->pubactions.pubtruncate |= pub->pubactions.pubtruncate;
}
else if (!list_member_oid(pubids, pub->oid))
{
/*
 * Relation is not associated with the publication anymore i.e
 * it would have been dropped from the publication. So we don't
 * need to publish the changes for it.
 */
entry->pubactions.pubinsert = false;
entry->pubactions.pubupdate = false;
entry->pubactions.pubdelete = false;
entry->pubactions.pubtruncate = false;
}

So this way, the fix only focuses on what we have reported here and it
doesn't cause any regressions may be, and the existing comment becomes
appropriate.

Thoughts?

> And it seems we can add a "continue;" in else branch.
> Of course this it not necessary.

As you said, it's not necessary because the following if condition
will fail anyways. Having said that, if we add continue, then any
future code that gets added after the following if condition will
never get executed. Though the reasoning may sound silly, IMO let's
not add the continue in the else branch.

if (entry->pubactions.pubinsert && entry->pubactions.pubupdate &&
entry->pubactions.pubdelete && entry->pubactions.pubtruncate)
break;

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-14 Thread Bharath Rupireddy
On Thu, Jan 14, 2021 at 10:53 AM Amit Kapila  wrote:
> > 0002 - Invalidates the relation map cache in subscriber syscache
> > invalidation callbacks. Currently, I'm setting entry->state to
> > SUBREL_STATE_UNKNOWN in the new invalidation function that's
> > introduced logicalrep_relmap_invalidate, so that in the next call to
> > logicalrep_rel_open the entry's state will be read from the system
> > catalogues using GetSubscriptionRelState. If this is sound's bit
> > strange, I can add a boolean invalid to LogicalRepRelMapEntry
> > structure and set that here and in logicalrep_rel_open, I can have
> > something like:
> > if (entry->state != SUBREL_STATE_READY || entry->invalid)
> > entry->state = GetSubscriptionRelState(MySubscription->oid,
> >entry->localreloid,
> >&entry->statelsn);
> >
> >if (entry->invalid)
> > entry->invalid = false;
> >
>
> So, the point of the second patch is that it good to have such a
> thing, otherwise, we don't see any problem if we just use patch-0001,
> right? I think if we fix the first-one, automatically, we will achieve
> what we are trying to with the second-one because ultimately
> logicalrep_relmap_update will be called due to Alter Pub... Drop table
> .. step and it will mark the entry as SUBREL_STATE_UNKNOWN.

On some further analysis, I found that logicalrep_relmap_update is
called in subscribers for inserts, delets, updates and truncate
statements on the dropped(from publication) relations in the
publisher.

If any alters to pg_subscription, then we invalidate the subscription
in subscription_change_cb, maybe_reread_subscription subscription will
take care of re-reading from the system catalogues, see
apply_handle_->ensure_transaction -> maybe_reread_subscription.
And we don't have any entry in LogicalRepRelMapEntry that gets
affected because of changes to pg_subscription, so we don't need to
invalidate the rel map cache entries in subscription_change_cb.

If any alters to pg_subscription_rel, then there are two parameters in
LogicalRepRelMapEntry structure, state and statelsn that may get
affected. Changes to statelsn column is taken care of with the
invalidation callback invalidate_syncing_table_states setting
table_states_valid to true and
process_syncing_tables->process_syncing_tables_for_apply. But, if
state column is changed somehow (although I'm not quite sure how it
can change to different states 'i', 'd', 's', 'r' after the initial
table sync phase in logical replication, but we can pretty much do
something like update pg_subscription_rel set srsubstate = 'd';), in
this case invalidate_syncing_table_states gets called, but if we don't
revalidation of relation map cache entries, they will have the old
state value.

So what I feel is at least we need the 0002 patch but with only
invalidating the entries in invalidate_syncing_table_states not in
subscription_change_cb, although I'm not able to back it up with any
use case or bug as such.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




RE: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-14 Thread Hou, Zhijie
> On Thu, Jan 14, 2021 at 5:36 PM Li Japin  wrote
> > Do we really need to access PUBLICATIONRELMAP in this patch? What if
> > we just set it to false in the else condition of (if (publish &&
> > (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))
> >
> > Thank for you review. I agree with you, it doesn’t need to access
> > PUBLICATIONRELMAP, since We already get the publication oid in
> > GetRelationPublications(relid) [1], which also access
> > PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the
> publish is false, so we do not need publish the table.
> 
> +1. This is enough.
> 
> > I have another question, the data->publications is a list, when did it
> has more then one items?
> 
> IIUC, when the single table is associated with multiple publications, then
> data->publications will have multiple entries. Though I have not tried,
> we can try having two or three publications for the same table and verify
> that.
> 
> > 0001 - change as you suggest.
> > 0002 - does not change.


Hi

In 0001 patch

The comments says " Relation is not associated with the publication anymore "
That comment was accurate when check is_relation_part_of_publication() in the 
last patch.

But when put the code in the else branch, not only the case in the 
comments(Relation is dropped),
Some other case such as "partitioned tables" will hit else branch too.

Do you think we need fix the comment a little to make it accurate?


And it seems we can add a "continue;" in else branch.
Of course this it not necessary.

Best regards,
houzj





Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-14 Thread Amit Kapila
On Thu, Jan 14, 2021 at 5:36 PM Li Japin  wrote:
>
> On Jan 14, 2021, at 1:25 PM, Amit Kapila  wrote:
>
> Attaching following two patches:
>
> 0001 - Makes publisher to not publish the changes for the alter
> publication ... dropped tables. Original patch is by japin, I added
> comments, changed the function name and adjusted the code a bit.
>
>
> Do we really need to access PUBLICATIONRELMAP in this patch? What if
> we just set it to false in the else condition of (if (publish &&
> (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))
>
>
> Thank for you review. I agree with you, it doesn’t need to access 
> PUBLICATIONRELMAP, since
> We already get the publication oid in GetRelationPublications(relid) [1], 
> which also access
> PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the publish 
> is false, so we
> do not need publish the table.
>
> I have another question, the data->publications is a list, when did it has 
> more then one items?
>
> 0001 - change as you suggest.
>

Can we add a test case for this patch as this seems to be a
reproducible bug? We can add the test in
src/test/subscription/100_bugs.pl unless you find a better place to
add it.

-- 
With Regards,
Amit Kapila.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-14 Thread Li Japin

> On Jan 14, 2021, at 8:44 PM, Amit Kapila  wrote:
> 
> On Thu, Jan 14, 2021 at 6:02 PM japin  wrote:
>> 
>> On Thu, 14 Jan 2021 at 20:19, Bharath Rupireddy wrote:
>>> On Thu, Jan 14, 2021 at 5:36 PM Li Japin  wrote
 Do we really need to access PUBLICATIONRELMAP in this patch? What if
 we just set it to false in the else condition of (if (publish &&
 (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))
 
 Thank for you review. I agree with you, it doesn’t need to access 
 PUBLICATIONRELMAP, since
 We already get the publication oid in GetRelationPublications(relid) [1], 
 which also access
 PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the 
 publish is false, so we
 do not need publish the table.
>>> 
>>> +1. This is enough.
>>> 
 I have another question, the data->publications is a list, when did it has 
 more then one items?
>>> 
>>> IIUC, when the single table is associated with multiple publications,
>>> then data->publications will have multiple entries. Though I have not
>>> tried, we can try having two or three publications for the same table
>>> and verify that.
>>> 
>> 
>> I try add one table into two publications, but the data->publications has 
>> only
>> one item.  Is there something I missed?
>> 
> 
> I think you will have multiple publications in that list when the
> subscriber has subscribed to multiple publications. For example,
> Create Subscription ... Publication pub1, Publication pub2.
> 

Thanks, you are right! When we create a subscription with multiple publications,
the data->publications has more then one items.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-14 Thread Amit Kapila
On Thu, Jan 14, 2021 at 6:02 PM japin  wrote:
>
> On Thu, 14 Jan 2021 at 20:19, Bharath Rupireddy wrote:
> > On Thu, Jan 14, 2021 at 5:36 PM Li Japin  wrote
> >> Do we really need to access PUBLICATIONRELMAP in this patch? What if
> >> we just set it to false in the else condition of (if (publish &&
> >> (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))
> >>
> >> Thank for you review. I agree with you, it doesn’t need to access 
> >> PUBLICATIONRELMAP, since
> >> We already get the publication oid in GetRelationPublications(relid) [1], 
> >> which also access
> >> PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the 
> >> publish is false, so we
> >> do not need publish the table.
> >
> > +1. This is enough.
> >
> >> I have another question, the data->publications is a list, when did it has 
> >> more then one items?
> >
> > IIUC, when the single table is associated with multiple publications,
> > then data->publications will have multiple entries. Though I have not
> > tried, we can try having two or three publications for the same table
> > and verify that.
> >
>
> I try add one table into two publications, but the data->publications has only
> one item.  Is there something I missed?
>

I think you will have multiple publications in that list when the
subscriber has subscribed to multiple publications. For example,
Create Subscription ... Publication pub1, Publication pub2.

-- 
With Regards,
Amit Kapila.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-14 Thread japin


On Thu, 14 Jan 2021 at 20:19, Bharath Rupireddy wrote:
> On Thu, Jan 14, 2021 at 5:36 PM Li Japin  wrote
>> Do we really need to access PUBLICATIONRELMAP in this patch? What if
>> we just set it to false in the else condition of (if (publish &&
>> (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))
>>
>> Thank for you review. I agree with you, it doesn’t need to access 
>> PUBLICATIONRELMAP, since
>> We already get the publication oid in GetRelationPublications(relid) [1], 
>> which also access
>> PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the publish 
>> is false, so we
>> do not need publish the table.
>
> +1. This is enough.
>
>> I have another question, the data->publications is a list, when did it has 
>> more then one items?
>
> IIUC, when the single table is associated with multiple publications,
> then data->publications will have multiple entries. Though I have not
> tried, we can try having two or three publications for the same table
> and verify that.
>

I try add one table into two publications, but the data->publications has only
one item.  Is there something I missed?

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-14 Thread Bharath Rupireddy
On Thu, Jan 14, 2021 at 5:36 PM Li Japin  wrote
> Do we really need to access PUBLICATIONRELMAP in this patch? What if
> we just set it to false in the else condition of (if (publish &&
> (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))
>
> Thank for you review. I agree with you, it doesn’t need to access 
> PUBLICATIONRELMAP, since
> We already get the publication oid in GetRelationPublications(relid) [1], 
> which also access
> PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the publish 
> is false, so we
> do not need publish the table.

+1. This is enough.

> I have another question, the data->publications is a list, when did it has 
> more then one items?

IIUC, when the single table is associated with multiple publications,
then data->publications will have multiple entries. Though I have not
tried, we can try having two or three publications for the same table
and verify that.

> 0001 - change as you suggest.
> 0002 - does not change.

Thanks for the updated patch. I will respond to Amit's previous
comment on the 0002 patch soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-14 Thread Li Japin

On Jan 14, 2021, at 1:25 PM, Amit Kapila 
mailto:amit.kapil...@gmail.com>> wrote:

On Wed, Jan 13, 2021 at 5:40 PM Bharath Rupireddy
mailto:bharath.rupireddyforpostg...@gmail.com>>
 wrote:

On Wed, Jan 13, 2021 at 3:16 PM Bharath Rupireddy
mailto:bharath.rupireddyforpostg...@gmail.com>>
 wrote:

On Wed, Jan 13, 2021 at 2:36 PM japin 
mailto:japi...@hotmail.com>> wrote:
In summary, I feel we need to fix the publisher sending the inserts
even though the table is dropped from the publication, that is the
patch patch proposed by japin. This solves the bug reported in this
thread.

And also, it's good to have the LogicalRepRelMap invalidation fix as a
0002 patch in invalidate_syncing_table_states, subscription_change_cb
and logicalrep_rel_open as proposed by me.

Thoughts?


I think invalidate the LogicalRepRelMap is necessary.  If the table isn't in
subscription, can we remove the LogicalRepRelMapEntry from LogicalRepRelMap?

IIUC, it's not possible to know the relid of the alter
publication...dropped table in the invalidation callbacks
invalidate_syncing_table_states or subscription_change_cb, so it's
better to set state of all the entries to SUBREL_STATE_UNKNOWN, so
that, in the next logicalrep_rel_open call the function
GetSubscriptionRelState will be called and the pg_subscription_rel
will be read freshly.

This is inline with the reasoning specified at [1].

[1] - 
https://www.postgresql.org/message-id/CAFiTN-udwuc_h0TaFrSEKb-Yo1vVvkjz9TDRw7VE3P-KiPSGbQ%40mail.gmail.com

Attaching following two patches:

0001 - Makes publisher to not publish the changes for the alter
publication ... dropped tables. Original patch is by japin, I added
comments, changed the function name and adjusted the code a bit.


Do we really need to access PUBLICATIONRELMAP in this patch? What if
we just set it to false in the else condition of (if (publish &&
(relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))


Thank for you review. I agree with you, it doesn’t need to access 
PUBLICATIONRELMAP, since
We already get the publication oid in GetRelationPublications(relid) [1], which 
also access
PUBLICATIONRELMAP.  If the current pub->oid does not in pubids, the publish is 
false, so we
do not need publish the table.

I have another question, the data->publications is a list, when did it has more 
then one items?

0001 - change as you suggest.
0002 - does not change.

Please consider v2 for further review.

[1]
List *
GetRelationPublications(Oid relid)
{
List   *result = NIL;
CatCList   *pubrellist;
int i;

/* Find all publications associated with the relation. */
pubrellist = SearchSysCacheList1(PUBLICATIONRELMAP,
 ObjectIdGetDatum(relid));
for (i = 0; i < pubrellist->n_members; i++)
{
HeapTuple   tup = &pubrellist->members[i]->tuple;
Oid pubid = ((Form_pg_publication_rel) GETSTRUCT(tup))->prpubid;

result = lappend_oid(result, pubid);
}

ReleaseSysCacheList(pubrellist);

return result;
}


--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



v2-0001-Fix-ALTER-PUBLICATION.DROP-TABLE-behaviour.patch
Description: v2-0001-Fix-ALTER-PUBLICATION.DROP-TABLE-behaviour.patch


v2-0002-Invalidate-relation-map-cache-in-subscriber-sysca.patch
Description:  v2-0002-Invalidate-relation-map-cache-in-subscriber-sysca.patch


Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-13 Thread Amit Kapila
On Wed, Jan 13, 2021 at 5:40 PM Bharath Rupireddy
 wrote:
>
> On Wed, Jan 13, 2021 at 3:16 PM Bharath Rupireddy
>  wrote:
> >
> > On Wed, Jan 13, 2021 at 2:36 PM japin  wrote:
> > > > In summary, I feel we need to fix the publisher sending the inserts
> > > > even though the table is dropped from the publication, that is the
> > > > patch patch proposed by japin. This solves the bug reported in this
> > > > thread.
> > > >
> > > > And also, it's good to have the LogicalRepRelMap invalidation fix as a
> > > > 0002 patch in invalidate_syncing_table_states, subscription_change_cb
> > > > and logicalrep_rel_open as proposed by me.
> > > >
> > > > Thoughts?
> > > >
> > >
> > > I think invalidate the LogicalRepRelMap is necessary.  If the table isn't 
> > > in
> > > subscription, can we remove the LogicalRepRelMapEntry from 
> > > LogicalRepRelMap?
> >
> > IIUC, it's not possible to know the relid of the alter
> > publication...dropped table in the invalidation callbacks
> > invalidate_syncing_table_states or subscription_change_cb, so it's
> > better to set state of all the entries to SUBREL_STATE_UNKNOWN, so
> > that, in the next logicalrep_rel_open call the function
> > GetSubscriptionRelState will be called and the pg_subscription_rel
> > will be read freshly.
> >
> > This is inline with the reasoning specified at [1].
> >
> > [1] - 
> > https://www.postgresql.org/message-id/CAFiTN-udwuc_h0TaFrSEKb-Yo1vVvkjz9TDRw7VE3P-KiPSGbQ%40mail.gmail.com
>
> Attaching following two patches:
>
> 0001 - Makes publisher to not publish the changes for the alter
> publication ... dropped tables. Original patch is by japin, I added
> comments, changed the function name and adjusted the code a bit.
>

Do we really need to access PUBLICATIONRELMAP in this patch? What if
we just set it to false in the else condition of (if (publish &&
(relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot)))

> 0002 - Invalidates the relation map cache in subscriber syscache
> invalidation callbacks. Currently, I'm setting entry->state to
> SUBREL_STATE_UNKNOWN in the new invalidation function that's
> introduced logicalrep_relmap_invalidate, so that in the next call to
> logicalrep_rel_open the entry's state will be read from the system
> catalogues using GetSubscriptionRelState. If this is sound's bit
> strange, I can add a boolean invalid to LogicalRepRelMapEntry
> structure and set that here and in logicalrep_rel_open, I can have
> something like:
> if (entry->state != SUBREL_STATE_READY || entry->invalid)
> entry->state = GetSubscriptionRelState(MySubscription->oid,
>entry->localreloid,
>&entry->statelsn);
>
>if (entry->invalid)
> entry->invalid = false;
>

So, the point of the second patch is that it good to have such a
thing, otherwise, we don't see any problem if we just use patch-0001,
right? I think if we fix the first-one, automatically, we will achieve
what we are trying to with the second-one because ultimately
logicalrep_relmap_update will be called due to Alter Pub... Drop table
.. step and it will mark the entry as SUBREL_STATE_UNKNOWN.

> make check and make check-world passes on both patches.
>
> If okay with the fixes, I will try to add a test case and also a cf
> entry so that these patches get a chance to be tested on all the
> platforms.
>

I think it is good to follow both the steps (adding a testcase and
registering it to CF).

-- 
With Regards,
Amit Kapila.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-13 Thread Bharath Rupireddy
On Wed, Jan 13, 2021 at 3:16 PM Bharath Rupireddy
 wrote:
>
> On Wed, Jan 13, 2021 at 2:36 PM japin  wrote:
> > > In summary, I feel we need to fix the publisher sending the inserts
> > > even though the table is dropped from the publication, that is the
> > > patch patch proposed by japin. This solves the bug reported in this
> > > thread.
> > >
> > > And also, it's good to have the LogicalRepRelMap invalidation fix as a
> > > 0002 patch in invalidate_syncing_table_states, subscription_change_cb
> > > and logicalrep_rel_open as proposed by me.
> > >
> > > Thoughts?
> > >
> >
> > I think invalidate the LogicalRepRelMap is necessary.  If the table isn't in
> > subscription, can we remove the LogicalRepRelMapEntry from LogicalRepRelMap?
>
> IIUC, it's not possible to know the relid of the alter
> publication...dropped table in the invalidation callbacks
> invalidate_syncing_table_states or subscription_change_cb, so it's
> better to set state of all the entries to SUBREL_STATE_UNKNOWN, so
> that, in the next logicalrep_rel_open call the function
> GetSubscriptionRelState will be called and the pg_subscription_rel
> will be read freshly.
>
> This is inline with the reasoning specified at [1].
>
> [1] - 
> https://www.postgresql.org/message-id/CAFiTN-udwuc_h0TaFrSEKb-Yo1vVvkjz9TDRw7VE3P-KiPSGbQ%40mail.gmail.com

Attaching following two patches:

0001 - Makes publisher to not publish the changes for the alter
publication ... dropped tables. Original patch is by japin, I added
comments, changed the function name and adjusted the code a bit.
0002 - Invalidates the relation map cache in subscriber syscache
invalidation callbacks. Currently, I'm setting entry->state to
SUBREL_STATE_UNKNOWN in the new invalidation function that's
introduced logicalrep_relmap_invalidate, so that in the next call to
logicalrep_rel_open the entry's state will be read from the system
catalogues using GetSubscriptionRelState. If this is sound's bit
strange, I can add a boolean invalid to LogicalRepRelMapEntry
structure and set that here and in logicalrep_rel_open, I can have
something like:
if (entry->state != SUBREL_STATE_READY || entry->invalid)
entry->state = GetSubscriptionRelState(MySubscription->oid,
   entry->localreloid,
   &entry->statelsn);

   if (entry->invalid)
entry->invalid = false;

make check and make check-world passes on both patches.

If okay with the fixes, I will try to add a test case and also a cf
entry so that these patches get a chance to be tested on all the
platforms.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v1-0001-Fix-ALTER-PUBLICATION.DROP-TABLE-behaviour.patch
Description: Binary data


v1-0002-Invalidate-relation-map-cache-in-subscriber-sysca.patch
Description: Binary data


Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-13 Thread Bharath Rupireddy
On Wed, Jan 13, 2021 at 5:15 PM Dilip Kumar  wrote:
>
> On Wed, Jan 13, 2021 at 3:08 PM Bharath Rupireddy
>  wrote:
> >
> > On Wed, Jan 13, 2021 at 2:53 PM Dilip Kumar  wrote:
> > > > IIUC the logical replication only replicate the tables in publication, 
> > > > I think
> > > > when the tables that aren't in publication should not be replicated.
> > > >
> > > > Attached the patch that fixes it.  Thought?
> > > >
> > >
> > > Instead of doing this, I would expect that the RelationSyncCache entry
> > > should be removed when the relation is dropped from the publication.
> > > So if that is done then it will reload the publication and then it
> > > will not find that that relation as published and it will ignore the
> > > changes.  But the patch doesn't seem to be exactly on that line.  Am I
> > > missing something here?
> >
> > IIUC, it's not possible to remove the cache entry inside
> > rel_sync_cache_publication_cb, because we don't receive the relid of
> > the alter publication .. dropped relation in the invalidation
> > callback. See the below comment,
> >
> > /*
> >  * There is no way to find which entry in our cache the hash belongs to 
> > so
> >  * mark the whole cache as invalid.
> >  */
> > hash_seq_init(&status, RelationSyncCache);
> > while ((entry = (RelationSyncEntry *) hash_seq_search(&status)) != NULL)
> > entry->replicate_valid = false;
>
> So basically your point is that all the RelationSyncEntry are getting
> invalidated right?  And if so then in get_rel_sync_entry we will just
> check the updated publication no?
> I am referring to the below code.  So ideally this should be already
> working if the relcaches are getting invalidated?
>
> get_rel_sync_entry
> {
> entry = (RelationSyncEntry *) hash_search(RelationSyncCache,
> (void *) &relid,
> HASH_ENTER, &found);
> Assert(entry != NULL);
>
> .
>
> /* Validate the entry */
> if (!entry->replicate_valid)
> {
> List *pubids = GetRelationPublications(relid);
> ListCell *lc;
> Oid publish_as_relid = relid;
> ... here we will rechek the pulication
> }

Yes, after the entries got invalidated in
rel_sync_cache_publication_cb, we get to the if
(!entry->replicate_valid) part of the code in get_rel_sync_entry, but
in below point, we don't mark the pubactions to false, which were
earlier set to true. Because of this, we will still have pubactions to
true because of which pgoutput_change publishes the changes.

/*
 * Don't publish changes for partitioned tables, because
 * publishing those of its partitions suffices, unless partition
 * changes won't be published due to pubviaroot being set.
 */
if (publish &&
(relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))
{
entry->pubactions.pubinsert |= pub->pubactions.pubinsert;
entry->pubactions.pubupdate |= pub->pubactions.pubupdate;
entry->pubactions.pubdelete |= pub->pubactions.pubdelete;
entry->pubactions.pubtruncate |= pub->pubactions.pubtruncate;
}

if (entry->pubactions.pubinsert && entry->pubactions.pubupdate &&
entry->pubactions.pubdelete && entry->pubactions.pubtruncate)
break;

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-13 Thread Dilip Kumar
On Wed, Jan 13, 2021 at 3:08 PM Bharath Rupireddy
 wrote:
>
> On Wed, Jan 13, 2021 at 2:53 PM Dilip Kumar  wrote:
> > > IIUC the logical replication only replicate the tables in publication, I 
> > > think
> > > when the tables that aren't in publication should not be replicated.
> > >
> > > Attached the patch that fixes it.  Thought?
> > >
> >
> > Instead of doing this, I would expect that the RelationSyncCache entry
> > should be removed when the relation is dropped from the publication.
> > So if that is done then it will reload the publication and then it
> > will not find that that relation as published and it will ignore the
> > changes.  But the patch doesn't seem to be exactly on that line.  Am I
> > missing something here?
>
> IIUC, it's not possible to remove the cache entry inside
> rel_sync_cache_publication_cb, because we don't receive the relid of
> the alter publication .. dropped relation in the invalidation
> callback. See the below comment,
>
> /*
>  * There is no way to find which entry in our cache the hash belongs to so
>  * mark the whole cache as invalid.
>  */
> hash_seq_init(&status, RelationSyncCache);
> while ((entry = (RelationSyncEntry *) hash_seq_search(&status)) != NULL)
> entry->replicate_valid = false;

So basically your point is that all the RelationSyncEntry are getting
invalidated right?  And if so then in get_rel_sync_entry we will just
check the updated publication no?
I am referring to the below code.  So ideally this should be already
working if the relcaches are getting invalidated?

get_rel_sync_entry
{
entry = (RelationSyncEntry *) hash_search(RelationSyncCache,
(void *) &relid,
HASH_ENTER, &found);
Assert(entry != NULL);

.

/* Validate the entry */
if (!entry->replicate_valid)
{
List *pubids = GetRelationPublications(relid);
ListCell *lc;
Oid publish_as_relid = relid;
... here we will rechek the pulication
}

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-13 Thread Bharath Rupireddy
On Wed, Jan 13, 2021 at 2:36 PM japin  wrote:
> > In summary, I feel we need to fix the publisher sending the inserts
> > even though the table is dropped from the publication, that is the
> > patch patch proposed by japin. This solves the bug reported in this
> > thread.
> >
> > And also, it's good to have the LogicalRepRelMap invalidation fix as a
> > 0002 patch in invalidate_syncing_table_states, subscription_change_cb
> > and logicalrep_rel_open as proposed by me.
> >
> > Thoughts?
> >
>
> I think invalidate the LogicalRepRelMap is necessary.  If the table isn't in
> subscription, can we remove the LogicalRepRelMapEntry from LogicalRepRelMap?

IIUC, it's not possible to know the relid of the alter
publication...dropped table in the invalidation callbacks
invalidate_syncing_table_states or subscription_change_cb, so it's
better to set state of all the entries to SUBREL_STATE_UNKNOWN, so
that, in the next logicalrep_rel_open call the function
GetSubscriptionRelState will be called and the pg_subscription_rel
will be read freshly.

This is inline with the reasoning specified at [1].

[1] - 
https://www.postgresql.org/message-id/CAFiTN-udwuc_h0TaFrSEKb-Yo1vVvkjz9TDRw7VE3P-KiPSGbQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-13 Thread Dilip Kumar
On Wed, Jan 13, 2021 at 3:08 PM Bharath Rupireddy
 wrote:
>
> On Wed, Jan 13, 2021 at 2:53 PM Dilip Kumar  wrote:
> > > IIUC the logical replication only replicate the tables in publication, I 
> > > think
> > > when the tables that aren't in publication should not be replicated.
> > >
> > > Attached the patch that fixes it.  Thought?
> > >
> >
> > Instead of doing this, I would expect that the RelationSyncCache entry
> > should be removed when the relation is dropped from the publication.
> > So if that is done then it will reload the publication and then it
> > will not find that that relation as published and it will ignore the
> > changes.  But the patch doesn't seem to be exactly on that line.  Am I
> > missing something here?
>
> IIUC, it's not possible to remove the cache entry inside
> rel_sync_cache_publication_cb, because we don't receive the relid of
> the alter publication .. dropped relation in the invalidation
> callback. See the below comment,

Hmm, yeah because nothing changed to the relation the change is for
the publication so the invalidation is not registered for the relation
entry.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-13 Thread Dilip Kumar
On Wed, Jan 13, 2021 at 3:06 PM japin  wrote:
>
>
> On Wed, 13 Jan 2021 at 17:23, Dilip Kumar wrote:
> > On Tue, Jan 12, 2021 at 4:47 PM Li Japin  wrote:
> >>
> >>
> >> On Jan 12, 2021, at 5:47 PM, japin  wrote:
> >>
> >>
> >> On Tue, 12 Jan 2021 at 14:38, Amit Kapila wrote:
> >>
> >> On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy
> >>  wrote:
> >>
> >>
> >> On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila  
> >> wrote:
> >>
> >>
> >> On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
> >>  wrote:
> >>
> >>
> >> Hi,
> >>
> >> While providing thoughts on the design in [1], I found a strange
> >> behaviour with the $subject. The use case is shown below as a sequence
> >> of steps that need to be run on publisher and subscriber to arrive at
> >> the strange behaviour.  In step 5, the table is dropped from the
> >> publication and in step 6, the refresh publication is run on the
> >> subscriber, from here onwards, the expectation is that no further
> >> inserts into the publisher table have to be replicated on to the
> >> subscriber, but the opposite happens i.e. the inserts are still
> >> replicated to the subscriber. ISTM as a bug. Let me know if I'm
> >> missing anything.
> >>
> >>
> >> Did you try to investigate what's going on? Can you please check what
> >> is the behavior if, after step-5, you restart the subscriber and
> >> separately try creating a new subscription (maybe on a different
> >> server) for that publication after step-5 and see if that allows the
> >> relation to be replicated? AFAIU, in AlterSubscription_refresh, we
> >> remove such dropped rels and stop their corresponding apply workers
> >> which should stop the further replication of such relations but that
> >> doesn't seem to be happening in your case.
> >>
> >>
> >> Here's my analysis:
> >> 1) in the publisher, alter publication drop table successfully
> >> removes(PublicationDropTables) the table from the catalogue
> >> pg_publication_rel
> >> 2) in the subscriber, alter subscription refresh publication
> >> successfully removes the table from the catalogue pg_subscription_rel
> >> (AlterSubscription_refresh->RemoveSubscriptionRel)
> >> so far so good
> >>
> >>
> >> Here, it should register the worker to stop on commit, and then on
> >> commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
> >> Once the apply worker is stopped, the corresponding WALSender will
> >> also be stopped. Something here is not happening as per expected
> >> behavior.
> >>
> >> 3) after the insertion into the table in the publisher(remember that
> >> it's dropped from the publication in (1)), the walsender process is
> >> unable detect that the table has been dropped from the publication
> >> i.e. it doesn't look at the pg_publication_rel catalogue or some
> >> other, but it only does is_publishable_relation() check which returns
> >> true in pgoutput_change(). Maybe the walsender should look at the
> >> catalogue pg_publication_rel in is_publishable_relation()?
> >>
> >>
> >> We must be somewhere checking pg_publication_rel before sending the
> >> decoded change because otherwise, we would have sent the changes for
> >> the table which are not even part of this publication. I think you can
> >> try to create a separate table that is not part of the publication
> >> under test and see how the changes for that are filtered.
> >>
> >>
> >> I find that pgoutput_change() use a hash table RelationSyncCache to
> >> cache the publication info for tables.  When we drop tables from the
> >> publication, the RelationSyncCache doesn't updated, so it replicate
> >> records.
> >>
> >>
> >> IIUC the logical replication only replicate the tables in publication, I 
> >> think
> >> when the tables that aren't in publication should not be replicated.
> >>
> >> Attached the patch that fixes it.  Thought?
> >>
> >
> > Instead of doing this, I would expect that the RelationSyncCache entry
> > should be removed when the relation is dropped from the publication.
> > So if that is done then it will reload the publication and then it
> > will not find that that relation as published and it will ignore the
> > changes.  But the patch doesn't seem to be exactly on that line.  Am I
> > missing something here?
>
> Even if we remove RelationSyncCache entry when the relation is dropped from
> the publication, when the relation is change (insert/update), AFAIK it also
> create an entry in RelationSyncCache, so removing the RelationSyncCache entry
> is unnecessary.
>

That is true but the entry->replicate_valid will be false and that
will be only set to true in get_rel_sync_entry.  So now based on the
new publication the pubaction will be updated ?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-13 Thread Bharath Rupireddy
On Wed, Jan 13, 2021 at 2:53 PM Dilip Kumar  wrote:
> > IIUC the logical replication only replicate the tables in publication, I 
> > think
> > when the tables that aren't in publication should not be replicated.
> >
> > Attached the patch that fixes it.  Thought?
> >
>
> Instead of doing this, I would expect that the RelationSyncCache entry
> should be removed when the relation is dropped from the publication.
> So if that is done then it will reload the publication and then it
> will not find that that relation as published and it will ignore the
> changes.  But the patch doesn't seem to be exactly on that line.  Am I
> missing something here?

IIUC, it's not possible to remove the cache entry inside
rel_sync_cache_publication_cb, because we don't receive the relid of
the alter publication .. dropped relation in the invalidation
callback. See the below comment,

/*
 * There is no way to find which entry in our cache the hash belongs to so
 * mark the whole cache as invalid.
 */
hash_seq_init(&status, RelationSyncCache);
while ((entry = (RelationSyncEntry *) hash_seq_search(&status)) != NULL)
entry->replicate_valid = false;

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-13 Thread japin


On Wed, 13 Jan 2021 at 17:23, Dilip Kumar wrote:
> On Tue, Jan 12, 2021 at 4:47 PM Li Japin  wrote:
>>
>>
>> On Jan 12, 2021, at 5:47 PM, japin  wrote:
>>
>>
>> On Tue, 12 Jan 2021 at 14:38, Amit Kapila wrote:
>>
>> On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy
>>  wrote:
>>
>>
>> On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila  wrote:
>>
>>
>> On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
>>  wrote:
>>
>>
>> Hi,
>>
>> While providing thoughts on the design in [1], I found a strange
>> behaviour with the $subject. The use case is shown below as a sequence
>> of steps that need to be run on publisher and subscriber to arrive at
>> the strange behaviour.  In step 5, the table is dropped from the
>> publication and in step 6, the refresh publication is run on the
>> subscriber, from here onwards, the expectation is that no further
>> inserts into the publisher table have to be replicated on to the
>> subscriber, but the opposite happens i.e. the inserts are still
>> replicated to the subscriber. ISTM as a bug. Let me know if I'm
>> missing anything.
>>
>>
>> Did you try to investigate what's going on? Can you please check what
>> is the behavior if, after step-5, you restart the subscriber and
>> separately try creating a new subscription (maybe on a different
>> server) for that publication after step-5 and see if that allows the
>> relation to be replicated? AFAIU, in AlterSubscription_refresh, we
>> remove such dropped rels and stop their corresponding apply workers
>> which should stop the further replication of such relations but that
>> doesn't seem to be happening in your case.
>>
>>
>> Here's my analysis:
>> 1) in the publisher, alter publication drop table successfully
>> removes(PublicationDropTables) the table from the catalogue
>> pg_publication_rel
>> 2) in the subscriber, alter subscription refresh publication
>> successfully removes the table from the catalogue pg_subscription_rel
>> (AlterSubscription_refresh->RemoveSubscriptionRel)
>> so far so good
>>
>>
>> Here, it should register the worker to stop on commit, and then on
>> commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
>> Once the apply worker is stopped, the corresponding WALSender will
>> also be stopped. Something here is not happening as per expected
>> behavior.
>>
>> 3) after the insertion into the table in the publisher(remember that
>> it's dropped from the publication in (1)), the walsender process is
>> unable detect that the table has been dropped from the publication
>> i.e. it doesn't look at the pg_publication_rel catalogue or some
>> other, but it only does is_publishable_relation() check which returns
>> true in pgoutput_change(). Maybe the walsender should look at the
>> catalogue pg_publication_rel in is_publishable_relation()?
>>
>>
>> We must be somewhere checking pg_publication_rel before sending the
>> decoded change because otherwise, we would have sent the changes for
>> the table which are not even part of this publication. I think you can
>> try to create a separate table that is not part of the publication
>> under test and see how the changes for that are filtered.
>>
>>
>> I find that pgoutput_change() use a hash table RelationSyncCache to
>> cache the publication info for tables.  When we drop tables from the
>> publication, the RelationSyncCache doesn't updated, so it replicate
>> records.
>>
>>
>> IIUC the logical replication only replicate the tables in publication, I 
>> think
>> when the tables that aren't in publication should not be replicated.
>>
>> Attached the patch that fixes it.  Thought?
>>
>
> Instead of doing this, I would expect that the RelationSyncCache entry
> should be removed when the relation is dropped from the publication.
> So if that is done then it will reload the publication and then it
> will not find that that relation as published and it will ignore the
> changes.  But the patch doesn't seem to be exactly on that line.  Am I
> missing something here?

Even if we remove RelationSyncCache entry when the relation is dropped from
the publication, when the relation is change (insert/update), AFAIK it also
create an entry in RelationSyncCache, so removing the RelationSyncCache entry
is unnecessary.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-13 Thread Dilip Kumar
On Tue, Jan 12, 2021 at 4:47 PM Li Japin  wrote:
>
>
> On Jan 12, 2021, at 5:47 PM, japin  wrote:
>
>
> On Tue, 12 Jan 2021 at 14:38, Amit Kapila wrote:
>
> On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy
>  wrote:
>
>
> On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila  wrote:
>
>
> On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
>  wrote:
>
>
> Hi,
>
> While providing thoughts on the design in [1], I found a strange
> behaviour with the $subject. The use case is shown below as a sequence
> of steps that need to be run on publisher and subscriber to arrive at
> the strange behaviour.  In step 5, the table is dropped from the
> publication and in step 6, the refresh publication is run on the
> subscriber, from here onwards, the expectation is that no further
> inserts into the publisher table have to be replicated on to the
> subscriber, but the opposite happens i.e. the inserts are still
> replicated to the subscriber. ISTM as a bug. Let me know if I'm
> missing anything.
>
>
> Did you try to investigate what's going on? Can you please check what
> is the behavior if, after step-5, you restart the subscriber and
> separately try creating a new subscription (maybe on a different
> server) for that publication after step-5 and see if that allows the
> relation to be replicated? AFAIU, in AlterSubscription_refresh, we
> remove such dropped rels and stop their corresponding apply workers
> which should stop the further replication of such relations but that
> doesn't seem to be happening in your case.
>
>
> Here's my analysis:
> 1) in the publisher, alter publication drop table successfully
> removes(PublicationDropTables) the table from the catalogue
> pg_publication_rel
> 2) in the subscriber, alter subscription refresh publication
> successfully removes the table from the catalogue pg_subscription_rel
> (AlterSubscription_refresh->RemoveSubscriptionRel)
> so far so good
>
>
> Here, it should register the worker to stop on commit, and then on
> commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
> Once the apply worker is stopped, the corresponding WALSender will
> also be stopped. Something here is not happening as per expected
> behavior.
>
> 3) after the insertion into the table in the publisher(remember that
> it's dropped from the publication in (1)), the walsender process is
> unable detect that the table has been dropped from the publication
> i.e. it doesn't look at the pg_publication_rel catalogue or some
> other, but it only does is_publishable_relation() check which returns
> true in pgoutput_change(). Maybe the walsender should look at the
> catalogue pg_publication_rel in is_publishable_relation()?
>
>
> We must be somewhere checking pg_publication_rel before sending the
> decoded change because otherwise, we would have sent the changes for
> the table which are not even part of this publication. I think you can
> try to create a separate table that is not part of the publication
> under test and see how the changes for that are filtered.
>
>
> I find that pgoutput_change() use a hash table RelationSyncCache to
> cache the publication info for tables.  When we drop tables from the
> publication, the RelationSyncCache doesn't updated, so it replicate
> records.
>
>
> IIUC the logical replication only replicate the tables in publication, I think
> when the tables that aren't in publication should not be replicated.
>
> Attached the patch that fixes it.  Thought?
>

Instead of doing this, I would expect that the RelationSyncCache entry
should be removed when the relation is dropped from the publication.
So if that is done then it will reload the publication and then it
will not find that that relation as published and it will ignore the
changes.  But the patch doesn't seem to be exactly on that line.  Am I
missing something here?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-13 Thread japin


On Wed, 13 Jan 2021 at 16:49, Bharath Rupireddy wrote:
> On Wed, Jan 13, 2021 at 11:27 AM Amit Kapila  wrote:
>>
>> On Wed, Jan 13, 2021 at 11:08 AM Bharath Rupireddy
>>  wrote:
>> >
>> > On Wed, Jan 13, 2021 at 10:33 AM Amit Kapila  
>> > wrote:
>> > >
>> > > On Tue, Jan 12, 2021 at 5:23 PM japin  wrote:
>> > > >
>> > > > On Tue, 12 Jan 2021 at 19:32, Bharath Rupireddy wrote:
>> > > > > On Tue, Jan 12, 2021 at 4:47 PM Li Japin  wrote:
>> > > > >> IIUC the logical replication only replicate the tables in 
>> > > > >> publication, I think
>> > > > >> when the tables that aren't in publication should not be replicated.
>> > > > >>
>> > > > >> Attached the patch that fixes it.  Thought?
>> > > > >
>> > > > > With that change, we don't get the behaviour that's stated in the
>> > > > > document - "The ADD TABLE and DROP TABLE clauses will add and remove
>> > > > > one or more tables from the publication. Note that adding tables to a
>> > > > > publication that is already subscribed to will require a ALTER
>> > > > > SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side 
>> > > > > in
>> > > > > order to become effective" -
>> > > > > https://www.postgresql.org/docs/devel/sql-alterpublication.html.
>> > > > >
>> > > >
>> > > > The documentation only emphasize adding tables to a publication, not
>> > > > include dropping tables from a publication.
>> > > >
>> > >
>> > > Right and I think that is ensured by the subscriber by calling
>> > > should_apply_changes_for_rel() which won't return true unless the
>> > > newly added relation is not synced via Refresh Publication. So, this
>> > > means with or without this patch we should be sending the changes of
>> > > the newly published table from the publisher?
>> >
>> > Oh, my bad, alter subscription...refresh publication is required only when 
>> > the tables are added to the publisher. Patch by japin makes the walsender 
>> > process to stop sending the data to the subscriber/apply worker. The patch 
>> > is based on the idea of looking at the PUBLICATIONRELMAP in 
>> > get_rel_sync_entry when the entries have been invalidated in 
>> > rel_sync_cache_publication_cb because of alter publication...drop table.
>> > ,
>> > When the alter subscription...refresh publication is run on the 
>> > subscriber, the SUBSCRIPTIONRELMAP catalogue gets invalidated but the 
>> > corresponding cache entries in the LogicalRepRelMap which is used by 
>> > logicalrep_rel_open are not invalidated. LogicalRepRelMap is used to know 
>> > the relations that are associated with the subscription. But we seem to 
>> > have not taken care of invalidating those entries, though we have the 
>> > invalidation callback invalidate_syncing_table_states registered for 
>> > SUBSCRIPTIONRELMAP in ApplyWorkerMain. So, we miss to work on updating the 
>> > entries in LogicalRepRelMap.
>> >
>> > IMO, the ideal way to fix this issue is 1) stop the walsender sending the 
>> > changes to dropped tables, for this japin patch works 2) we must mark all 
>> > the LogicalRepRelMap entries as invalid in invalidate_syncing_table_states 
>> > so that in the next logicalrep_rel_open, if the entry is invalidated, then 
>> > we have to call GetSubscriptionRelState to get the latest state, as shown 
>> > in [1]. Likewise, we might also have to mark the cache entries invalid in  
>> > subscription_change_cb which is invalidation callback for pg_subscription
>> >
>>
>> Is the second point you described here is related to the original
>> bug-reported or will it cause some other problem?
>
> It can cause some other problems. I found this while investigating
> from the subscriber perspective why the subscriber is accepting the
> inserts even though the relation is removed from the
> pg_subscription_rel catalogue after refresh publication. I ended up
> finding that the cache entries are not being invalidated in
> invalidate_syncing_table_states.
>
> Having said that, the patch proposed by japin is enough to solve the
> bug reported here.
>
> While we are fixing the bug, I thought it's better to fix this as
> well, maybe as a 0002 patch? If okay, I can work on the patch and post
> it in a separate thread?
>
>> > Thoughts?
>> >
>> > [1] -
>> > if (entry->state != SUBREL_STATE_READY || entry->invalid)
>> > entry->state = GetSubscriptionRelState(MySubscription->oid,
>> >entry->localreloid,
>> >&entry->statelsn);
>> >
>> >if (entry->invalid)
>> >entry->invalid = false;
>> >
>> > return entry;
>> >
>> > > I have another question on your patch which is why in some cases like
>> > > when we have not inserted in step-5 (as mentioned by you) the
>> > > following insertions are not sent. Is somehow we are setting the
>> > > pubactions as false in that case, if so, how?
>> >
>> > The reason is that the issue reported in this thread occurs - when we have 
>> > the walsender process running, RelationSyn

Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-13 Thread Bharath Rupireddy
On Wed, Jan 13, 2021 at 11:27 AM Amit Kapila  wrote:
>
> On Wed, Jan 13, 2021 at 11:08 AM Bharath Rupireddy
>  wrote:
> >
> > On Wed, Jan 13, 2021 at 10:33 AM Amit Kapila  
> > wrote:
> > >
> > > On Tue, Jan 12, 2021 at 5:23 PM japin  wrote:
> > > >
> > > > On Tue, 12 Jan 2021 at 19:32, Bharath Rupireddy wrote:
> > > > > On Tue, Jan 12, 2021 at 4:47 PM Li Japin  wrote:
> > > > >> IIUC the logical replication only replicate the tables in 
> > > > >> publication, I think
> > > > >> when the tables that aren't in publication should not be replicated.
> > > > >>
> > > > >> Attached the patch that fixes it.  Thought?
> > > > >
> > > > > With that change, we don't get the behaviour that's stated in the
> > > > > document - "The ADD TABLE and DROP TABLE clauses will add and remove
> > > > > one or more tables from the publication. Note that adding tables to a
> > > > > publication that is already subscribed to will require a ALTER
> > > > > SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side in
> > > > > order to become effective" -
> > > > > https://www.postgresql.org/docs/devel/sql-alterpublication.html.
> > > > >
> > > >
> > > > The documentation only emphasize adding tables to a publication, not
> > > > include dropping tables from a publication.
> > > >
> > >
> > > Right and I think that is ensured by the subscriber by calling
> > > should_apply_changes_for_rel() which won't return true unless the
> > > newly added relation is not synced via Refresh Publication. So, this
> > > means with or without this patch we should be sending the changes of
> > > the newly published table from the publisher?
> >
> > Oh, my bad, alter subscription...refresh publication is required only when 
> > the tables are added to the publisher. Patch by japin makes the walsender 
> > process to stop sending the data to the subscriber/apply worker. The patch 
> > is based on the idea of looking at the PUBLICATIONRELMAP in 
> > get_rel_sync_entry when the entries have been invalidated in 
> > rel_sync_cache_publication_cb because of alter publication...drop table.
> > ,
> > When the alter subscription...refresh publication is run on the subscriber, 
> > the SUBSCRIPTIONRELMAP catalogue gets invalidated but the corresponding 
> > cache entries in the LogicalRepRelMap which is used by logicalrep_rel_open 
> > are not invalidated. LogicalRepRelMap is used to know the relations that 
> > are associated with the subscription. But we seem to have not taken care of 
> > invalidating those entries, though we have the invalidation callback 
> > invalidate_syncing_table_states registered for SUBSCRIPTIONRELMAP in 
> > ApplyWorkerMain. So, we miss to work on updating the entries in 
> > LogicalRepRelMap.
> >
> > IMO, the ideal way to fix this issue is 1) stop the walsender sending the 
> > changes to dropped tables, for this japin patch works 2) we must mark all 
> > the LogicalRepRelMap entries as invalid in invalidate_syncing_table_states 
> > so that in the next logicalrep_rel_open, if the entry is invalidated, then 
> > we have to call GetSubscriptionRelState to get the latest state, as shown 
> > in [1]. Likewise, we might also have to mark the cache entries invalid in  
> > subscription_change_cb which is invalidation callback for pg_subscription
> >
>
> Is the second point you described here is related to the original
> bug-reported or will it cause some other problem?

It can cause some other problems. I found this while investigating
from the subscriber perspective why the subscriber is accepting the
inserts even though the relation is removed from the
pg_subscription_rel catalogue after refresh publication. I ended up
finding that the cache entries are not being invalidated in
invalidate_syncing_table_states.

Having said that, the patch proposed by japin is enough to solve the
bug reported here.

While we are fixing the bug, I thought it's better to fix this as
well, maybe as a 0002 patch? If okay, I can work on the patch and post
it in a separate thread?

> > Thoughts?
> >
> > [1] -
> > if (entry->state != SUBREL_STATE_READY || entry->invalid)
> > entry->state = GetSubscriptionRelState(MySubscription->oid,
> >entry->localreloid,
> >&entry->statelsn);
> >
> >if (entry->invalid)
> >entry->invalid = false;
> >
> > return entry;
> >
> > > I have another question on your patch which is why in some cases like
> > > when we have not inserted in step-5 (as mentioned by you) the
> > > following insertions are not sent. Is somehow we are setting the
> > > pubactions as false in that case, if so, how?
> >
> > The reason is that the issue reported in this thread occurs - when we have 
> > the walsender process running, RelationSyncCache is initialized, we 
> > inserted some data into the table that's sent to the subscriber and the 
> > table is dropped, we miss to set the pubactions to false in 
> > get_rel_

Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-12 Thread japin


On Wed, 13 Jan 2021 at 13:26, Amit Kapila wrote:
> On Tue, Jan 12, 2021 at 4:59 PM Bharath Rupireddy
>  wrote:
>>
>> On Tue, Jan 12, 2021 at 12:06 PM Amit Kapila  wrote:
>> > > Here's my analysis:
>> > > 1) in the publisher, alter publication drop table successfully
>> > > removes(PublicationDropTables) the table from the catalogue
>> > > pg_publication_rel
>> > > 2) in the subscriber, alter subscription refresh publication
>> > > successfully removes the table from the catalogue pg_subscription_rel
>> > > (AlterSubscription_refresh->RemoveSubscriptionRel)
>> > > so far so good
>> > >
>> >
>> > Here, it should register the worker to stop on commit, and then on
>> > commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
>> > Once the apply worker is stopped, the corresponding WALSender will
>> > also be stopped. Something here is not happening as per expected
>> > behavior.
>>
>> On the subscriber, an entry for worker stop is created in 
>> AlterSubscription_refresh --> logicalrep_worker_stop_at_commit. At the end 
>> of txn, in AtEOXact_ApplyLauncher, we try to stop that worker, but it cannot 
>> be stopped because logicalrep_worker_find returns null 
>> (AtEOXact_ApplyLauncher --> logicalrep_worker_stop --> 
>> logicalrep_worker_find). The worker entry for that subscriber is having 
>> relid as 0 [1], due to which the following if condition will not be hit. The 
>> apply worker on the subscriber related to the subscription on which refresh 
>> publication was run is not closed. It looks like relid 0 is valid because it 
>> will be applicable only during the table sync phase, the comment in the 
>> LogicalRepWorker structure says that.
>>
>> And also, I think, expecting the apply worker to be closed this way doesn't 
>> make sense because the apply worker is a per-subscription base, and the 
>> subscription can have other tables too.
>>
>
> Okay, that makes sense. As responded to Li Japin, let's focus on
> figuring out why we are sending the changes from the publisher node in
> some cases and not in other cases.

After some analysis, I find that the dropped tables always replicate to 
subscriber.
The difference is that if we drop the table from publication and refresh
publication (on subscriber), the LogicalRepRelMapEntry in 
should_apply_changes_for_rel()
set state to SUBREL_STATE_UNKNOWN.

(gdb) p *rel
$2 = {remoterel = {remoteid = 16410, nspname = 0x5564fb0177c0 "public",
relname = 0x5564fb0177a0 "t1", natts = 1, attnames = 0x5564fb0177e0, 
atttyps = 0x5564fb017780,
replident = 100 'd', relkind = 0 '\000', attkeys = 0x0}, localrelvalid = 
true,
  localreloid = 16412, localrel = 0x7f78705da1b8, attrmap = 0x5564fb017800, 
updatable = false,
  *state = 0 '\000'*, statelsn = 0}

If we insert data between drop table from publication and refresh publication, 
the
LogicalRepRelMapEntry state is always SUBREL_STATE_READY.

(gdb) p *rel
$2 = {remoterel = {remoteid = 16410, nspname = 0x5564fb0177c0 "public",
relname = 0x5564fb0177a0 "t1", natts = 1, attnames = 0x5564fb0177e0, 
atttyps = 0x5564fb017780,
replident = 100 'd', relkind = 0 '\000', attkeys = 0x0}, localrelvalid = 
true,
  localreloid = 16412, localrel = 0x7f78705d9d38, attrmap = 0x5564fb017800, 
updatable = false,
  *state = 114 'r'*, statelsn = 23545672}

I will dig why the state of LogicalRepRelMapEntry doesn't change in second case.

Any suggestion is welcome!

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-12 Thread Amit Kapila
On Wed, Jan 13, 2021 at 11:08 AM Bharath Rupireddy
 wrote:
>
> On Wed, Jan 13, 2021 at 10:33 AM Amit Kapila  wrote:
> >
> > On Tue, Jan 12, 2021 at 5:23 PM japin  wrote:
> > >
> > > On Tue, 12 Jan 2021 at 19:32, Bharath Rupireddy wrote:
> > > > On Tue, Jan 12, 2021 at 4:47 PM Li Japin  wrote:
> > > >> IIUC the logical replication only replicate the tables in publication, 
> > > >> I think
> > > >> when the tables that aren't in publication should not be replicated.
> > > >>
> > > >> Attached the patch that fixes it.  Thought?
> > > >
> > > > With that change, we don't get the behaviour that's stated in the
> > > > document - "The ADD TABLE and DROP TABLE clauses will add and remove
> > > > one or more tables from the publication. Note that adding tables to a
> > > > publication that is already subscribed to will require a ALTER
> > > > SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side in
> > > > order to become effective" -
> > > > https://www.postgresql.org/docs/devel/sql-alterpublication.html.
> > > >
> > >
> > > The documentation only emphasize adding tables to a publication, not
> > > include dropping tables from a publication.
> > >
> >
> > Right and I think that is ensured by the subscriber by calling
> > should_apply_changes_for_rel() which won't return true unless the
> > newly added relation is not synced via Refresh Publication. So, this
> > means with or without this patch we should be sending the changes of
> > the newly published table from the publisher?
>
> Oh, my bad, alter subscription...refresh publication is required only when 
> the tables are added to the publisher. Patch by japin makes the walsender 
> process to stop sending the data to the subscriber/apply worker. The patch is 
> based on the idea of looking at the PUBLICATIONRELMAP in get_rel_sync_entry 
> when the entries have been invalidated in rel_sync_cache_publication_cb 
> because of alter publication...drop table.
> ,
> When the alter subscription...refresh publication is run on the subscriber, 
> the SUBSCRIPTIONRELMAP catalogue gets invalidated but the corresponding cache 
> entries in the LogicalRepRelMap which is used by logicalrep_rel_open are not 
> invalidated. LogicalRepRelMap is used to know the relations that are 
> associated with the subscription. But we seem to have not taken care of 
> invalidating those entries, though we have the invalidation callback 
> invalidate_syncing_table_states registered for SUBSCRIPTIONRELMAP in 
> ApplyWorkerMain. So, we miss to work on updating the entries in 
> LogicalRepRelMap.
>
> IMO, the ideal way to fix this issue is 1) stop the walsender sending the 
> changes to dropped tables, for this japin patch works 2) we must mark all the 
> LogicalRepRelMap entries as invalid in invalidate_syncing_table_states so 
> that in the next logicalrep_rel_open, if the entry is invalidated, then we 
> have to call GetSubscriptionRelState to get the latest state, as shown in 
> [1]. Likewise, we might also have to mark the cache entries invalid in  
> subscription_change_cb which is invalidation callback for pg_subscription
>

Is the second point you described here is related to the original
bug-reported or will it cause some other problem?

> Thoughts?
>
> [1] -
> if (entry->state != SUBREL_STATE_READY || entry->invalid)
> entry->state = GetSubscriptionRelState(MySubscription->oid,
>entry->localreloid,
>&entry->statelsn);
>
>if (entry->invalid)
>entry->invalid = false;
>
> return entry;
>
> > I have another question on your patch which is why in some cases like
> > when we have not inserted in step-5 (as mentioned by you) the
> > following insertions are not sent. Is somehow we are setting the
> > pubactions as false in that case, if so, how?
>
> The reason is that the issue reported in this thread occurs - when we have 
> the walsender process running, RelationSyncCache is initialized, we inserted 
> some data into the table that's sent to the subscriber and the table is 
> dropped, we miss to set the pubactions to false in get_rel_sync_entry, though 
> the cache entries have been invalidated.
>
> In some cases, it works properly because the old walsender process was 
> stopped and when a new walsender is started, then the cache RelationSyncCache 
> gets initialized again and the pubactions will be set to false in 
> get_rel_sync_entry.
>

Why is walsender process was getting stopped in one case but not in another?

-- 
With Regards,
Amit Kapila.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-12 Thread Bharath Rupireddy
On Wed, Jan 13, 2021 at 10:33 AM Amit Kapila 
wrote:
>
> On Tue, Jan 12, 2021 at 5:23 PM japin  wrote:
> >
> > On Tue, 12 Jan 2021 at 19:32, Bharath Rupireddy wrote:
> > > On Tue, Jan 12, 2021 at 4:47 PM Li Japin  wrote:
> > >> IIUC the logical replication only replicate the tables in
publication, I think
> > >> when the tables that aren't in publication should not be replicated.
> > >>
> > >> Attached the patch that fixes it.  Thought?
> > >
> > > With that change, we don't get the behaviour that's stated in the
> > > document - "The ADD TABLE and DROP TABLE clauses will add and remove
> > > one or more tables from the publication. Note that adding tables to a
> > > publication that is already subscribed to will require a ALTER
> > > SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side in
> > > order to become effective" -
> > > https://www.postgresql.org/docs/devel/sql-alterpublication.html.
> > >
> >
> > The documentation only emphasize adding tables to a publication, not
> > include dropping tables from a publication.
> >
>
> Right and I think that is ensured by the subscriber by calling
> should_apply_changes_for_rel() which won't return true unless the
> newly added relation is not synced via Refresh Publication. So, this
> means with or without this patch we should be sending the changes of
> the newly published table from the publisher?

Oh, my bad, alter subscription...refresh publication is required only when
the tables are added to the publisher. Patch by japin makes the walsender
process to stop sending the data to the subscriber/apply worker. The patch
is based on the idea of looking at the PUBLICATIONRELMAP in
get_rel_sync_entry when the entries have been invalidated in
rel_sync_cache_publication_cb because of alter publication...drop table.
,
When the alter subscription...refresh publication is run on the subscriber,
the SUBSCRIPTIONRELMAP catalogue gets invalidated but the corresponding
cache entries in the LogicalRepRelMap which is used by logicalrep_rel_open
are not invalidated. LogicalRepRelMap is used to know the relations that
are associated with the subscription. But we seem to have not taken care of
invalidating those entries, though we have the invalidation callback
invalidate_syncing_table_states registered for SUBSCRIPTIONRELMAP in
ApplyWorkerMain. So, we miss to work on updating the entries in
LogicalRepRelMap.

IMO, the ideal way to fix this issue is 1) stop the walsender sending the
changes to dropped tables, for this japin patch works 2) we must mark all
the LogicalRepRelMap entries as invalid in invalidate_syncing_table_states
so that in the next logicalrep_rel_open, if the entry is invalidated, then
we have to call GetSubscriptionRelState to get the latest state, as shown
in [1]. Likewise, we might also have to mark the cache entries invalid in
 subscription_change_cb which is invalidation callback for pg_subscription

Thoughts?

[1] -
if (entry->state != SUBREL_STATE_READY *|| entry->invalid*)
entry->state = GetSubscriptionRelState(MySubscription->oid,
   entry->localreloid,
   &entry->statelsn);

   if (entry->invalid)
   entry->invalid = false;

return entry;

> I have another question on your patch which is why in some cases like
> when we have not inserted in step-5 (as mentioned by you) the
> following insertions are not sent. Is somehow we are setting the
> pubactions as false in that case, if so, how?

The reason is that the issue reported in this thread occurs - when we have
the walsender process running, RelationSyncCache is initialized, we
inserted some data into the table that's sent to the subscriber and the
table is dropped, we miss to set the pubactions to false in
get_rel_sync_entry, though the cache entries have been invalidated.

In some cases, it works properly because the old walsender process was
stopped and when a new walsender is started, then the cache
RelationSyncCache gets initialized again and the pubactions will be set to
false in get_rel_sync_entry.

if (!found)
{
/* immediately make a new entry valid enough to satisfy callbacks */
entry->schema_sent = false;
entry->streamed_txns = NIL;
entry->replicate_valid = false;
entry->pubactions.pubinsert = entry->pubactions.pubupdate =
entry->pubactions.pubdelete = entry->pubactions.pubtruncate =
false;
entry->publish_as_relid = InvalidOid;
}

Hope that clarifies why the issue happens in some cases and not in other
cases.

> > > The publisher stops sending the tuples whenever the relation gets
> > > dropped from the publication, not waiting until alter subscription ...
> > > refresh publication on the subscriber.
> > >
> >
> > If we want to wait the subscriber executing alter subscription ...
refresh publication,
> > maybe we should send some feedback to walsender.  How can we send this
feedback to
> > walsender in no

Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-12 Thread Amit Kapila
On Tue, Jan 12, 2021 at 4:59 PM Bharath Rupireddy
 wrote:
>
> On Tue, Jan 12, 2021 at 12:06 PM Amit Kapila  wrote:
> > > Here's my analysis:
> > > 1) in the publisher, alter publication drop table successfully
> > > removes(PublicationDropTables) the table from the catalogue
> > > pg_publication_rel
> > > 2) in the subscriber, alter subscription refresh publication
> > > successfully removes the table from the catalogue pg_subscription_rel
> > > (AlterSubscription_refresh->RemoveSubscriptionRel)
> > > so far so good
> > >
> >
> > Here, it should register the worker to stop on commit, and then on
> > commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
> > Once the apply worker is stopped, the corresponding WALSender will
> > also be stopped. Something here is not happening as per expected
> > behavior.
>
> On the subscriber, an entry for worker stop is created in 
> AlterSubscription_refresh --> logicalrep_worker_stop_at_commit. At the end of 
> txn, in AtEOXact_ApplyLauncher, we try to stop that worker, but it cannot be 
> stopped because logicalrep_worker_find returns null (AtEOXact_ApplyLauncher 
> --> logicalrep_worker_stop --> logicalrep_worker_find). The worker entry for 
> that subscriber is having relid as 0 [1], due to which the following if 
> condition will not be hit. The apply worker on the subscriber related to the 
> subscription on which refresh publication was run is not closed. It looks 
> like relid 0 is valid because it will be applicable only during the table 
> sync phase, the comment in the LogicalRepWorker structure says that.
>
> And also, I think, expecting the apply worker to be closed this way doesn't 
> make sense because the apply worker is a per-subscription base, and the 
> subscription can have other tables too.
>

Okay, that makes sense. As responded to Li Japin, let's focus on
figuring out why we are sending the changes from the publisher node in
some cases and not in other cases.

-- 
With Regards,
Amit Kapila.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-12 Thread Amit Kapila
On Tue, Jan 12, 2021 at 5:23 PM japin  wrote:
>
> On Tue, 12 Jan 2021 at 19:32, Bharath Rupireddy wrote:
> > On Tue, Jan 12, 2021 at 4:47 PM Li Japin  wrote:
> >> IIUC the logical replication only replicate the tables in publication, I 
> >> think
> >> when the tables that aren't in publication should not be replicated.
> >>
> >> Attached the patch that fixes it.  Thought?
> >
> > With that change, we don't get the behaviour that's stated in the
> > document - "The ADD TABLE and DROP TABLE clauses will add and remove
> > one or more tables from the publication. Note that adding tables to a
> > publication that is already subscribed to will require a ALTER
> > SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side in
> > order to become effective" -
> > https://www.postgresql.org/docs/devel/sql-alterpublication.html.
> >
>
> The documentation only emphasize adding tables to a publication, not
> include dropping tables from a publication.
>

Right and I think that is ensured by the subscriber by calling
should_apply_changes_for_rel() which won't return true unless the
newly added relation is not synced via Refresh Publication. So, this
means with or without this patch we should be sending the changes of
the newly published table from the publisher?

I have another question on your patch which is why in some cases like
when we have not inserted in step-5 (as mentioned by you) the
following insertions are not sent. Is somehow we are setting the
pubactions as false in that case, if so, how?

> > The publisher stops sending the tuples whenever the relation gets
> > dropped from the publication, not waiting until alter subscription ...
> > refresh publication on the subscriber.
> >
>
> If we want to wait the subscriber executing alter subscription ... refresh 
> publication,
> maybe we should send some feedback to walsender.  How can we send this 
> feedback to
> walsender in non-walreceiver process?
>

I don't think we need this if what I said above is correct.

-- 
With Regards,
Amit Kapila.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-12 Thread japin


On Tue, 12 Jan 2021 at 19:32, Bharath Rupireddy wrote:
> On Tue, Jan 12, 2021 at 4:47 PM Li Japin  wrote:
>> IIUC the logical replication only replicate the tables in publication, I 
>> think
>> when the tables that aren't in publication should not be replicated.
>>
>> Attached the patch that fixes it.  Thought?
>
> With that change, we don't get the behaviour that's stated in the
> document - "The ADD TABLE and DROP TABLE clauses will add and remove
> one or more tables from the publication. Note that adding tables to a
> publication that is already subscribed to will require a ALTER
> SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side in
> order to become effective" -
> https://www.postgresql.org/docs/devel/sql-alterpublication.html.
>

The documentation only emphasize adding tables to a publication, not
include dropping tables from a publication.

> The publisher stops sending the tuples whenever the relation gets
> dropped from the publication, not waiting until alter subscription ...
> refresh publication on the subscriber.
>

If we want to wait the subscriber executing alter subscription ... refresh 
publication,
maybe we should send some feedback to walsender.  How can we send this feedback 
to
walsender in non-walreceiver process?

> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com


-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-12 Thread Bharath Rupireddy
On Tue, Jan 12, 2021 at 4:47 PM Li Japin  wrote:
> IIUC the logical replication only replicate the tables in publication, I think
> when the tables that aren't in publication should not be replicated.
>
> Attached the patch that fixes it.  Thought?

With that change, we don't get the behaviour that's stated in the
document - "The ADD TABLE and DROP TABLE clauses will add and remove
one or more tables from the publication. Note that adding tables to a
publication that is already subscribed to will require a ALTER
SUBSCRIPTION ... REFRESH PUBLICATION action on the subscribing side in
order to become effective" -
https://www.postgresql.org/docs/devel/sql-alterpublication.html.

The publisher stops sending the tuples whenever the relation gets
dropped from the publication, not waiting until alter subscription ...
refresh publication on the subscriber.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-12 Thread Bharath Rupireddy
On Tue, Jan 12, 2021 at 12:06 PM Amit Kapila 
wrote:
> > Here's my analysis:
> > 1) in the publisher, alter publication drop table successfully
> > removes(PublicationDropTables) the table from the catalogue
> > pg_publication_rel
> > 2) in the subscriber, alter subscription refresh publication
> > successfully removes the table from the catalogue pg_subscription_rel
> > (AlterSubscription_refresh->RemoveSubscriptionRel)
> > so far so good
> >
>
> Here, it should register the worker to stop on commit, and then on
> commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
> Once the apply worker is stopped, the corresponding WALSender will
> also be stopped. Something here is not happening as per expected
> behavior.

On the subscriber, an entry for worker stop is created in
AlterSubscription_refresh --> logicalrep_worker_stop_at_commit. At the end
of txn, in AtEOXact_ApplyLauncher, we try to stop that worker, but it
cannot be stopped because logicalrep_worker_find returns null
(AtEOXact_ApplyLauncher --> logicalrep_worker_stop -->
logicalrep_worker_find). The worker entry for that subscriber is having
relid as 0 [1], due to which the following if condition will not be hit.
The apply worker on the subscriber related to the subscription on which
refresh publication was run is not closed. It looks like relid 0 is valid
because it will be applicable only during the table sync phase, the comment
in the LogicalRepWorker structure says that.

And also, I think, expecting the apply worker to be closed this way doesn't
make sense because the apply worker is a per-subscription base, and the
subscription can have other tables too.

/* Search for attached worker for a given subscription id. */
for (i = 0; i < max_logical_replication_workers; i++)
{
LogicalRepWorker *w = &LogicalRepCtx->workers[i];

if (w->in_use && w->subid == subid && w->relid == relid &&
(!only_running || w->proc))
{
res = w;
break;
}
}

[1]
(gdb) p subid
$5 = 16391
(gdb) p relid
$6 = 16388
(gdb) p *w
$4 = {launch_time = 663760343317760, in_use = true, generation = 1, proc =
0x7fdfd9a7cc90,
  dbid = 12872, userid = 10, *subid = 16391*,* relid = 0*, relstate = 0
'\000', relstate_lsn = 0,
  relmutex = 0 '\000', last_lsn = 22798424, last_send_time =
663760483945980,
  last_recv_time = 663760483946087, reply_lsn = 22798424, reply_time =
663760483945980}

postgres=# select * from pg_stat_get_subscription(16391);
 subid | relid |  pid   | received_lsn |last_msg_send_time|
 last_msg_receipt_time   | latest_end_lsn | latest_end_time

---+---++--+--+--++--
 16391 |   | 466779 | 0/15BE140| 2021-01-12 15:26:48.778813+05:30 |
2021-01-12 15:26:48.778878+05:30 | 0/15BE140  | 2021-01-12
15:26:48.778813+05:30
(1 row)

> > 3) after the insertion into the table in the publisher(remember that
> > it's dropped from the publication in (1)), the walsender process is
> > unable detect that the table has been dropped from the publication
> > i.e. it doesn't look at the pg_publication_rel catalogue or some
> > other, but it only does is_publishable_relation() check which returns
> > true in pgoutput_change(). Maybe the walsender should look at the
> > catalogue pg_publication_rel in is_publishable_relation()?
> >
>
> We must be somewhere checking pg_publication_rel before sending the
> decoded change because otherwise, we would have sent the changes for
> the table which are not even part of this publication. I think you can
> try to create a separate table that is not part of the publication
> under test and see how the changes for that are filtered.

As pointed out by japin in the next thread, the walsender process in the
publisher uses RelationSyncCache to hold the relations to which the
insertions need to be sent to the subscriber. The RelationSyncCache gets
created during startup of the walsender
process(pgoutput_startup->init_rel_sync_cache) and also
rel_sync_cache_publication_cb callback gets registered. So, if any alters
happen to the pg_publication_rel catalog table, the callback
rel_sync_cache_publication_cb is called, all the entries in the
RelationSyncCache are marked as invalid, with the expectation that on the
next use of any cache entry in get_rel_sync_entry
(pgoutput_change->get_rel_sync_entry), that entry is validated again.

In the use case, the invalidation happens as expected in
rel_sync_cache_publication_cb and while revalidating the entry in
get_rel_sync_entry, since there is only one publication to which the given
relation is attached to, the pubids will be null and we don't set the
entry->pubactions.pubinsert/update/delete/truncate to false, so the
publisher keeps publishing the inserts to the relation.

/* Validate the entry */
if (!entry->replicate_valid)
{
List   *pubids

Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-12 Thread Li Japin

On Jan 12, 2021, at 5:47 PM, japin 
mailto:japi...@hotmail.com>> wrote:


On Tue, 12 Jan 2021 at 14:38, Amit Kapila wrote:
On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy
mailto:bharath.rupireddyforpostg...@gmail.com>>
 wrote:

On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila 
mailto:amit.kapil...@gmail.com>> wrote:

On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
mailto:bharath.rupireddyforpostg...@gmail.com>>
 wrote:

Hi,

While providing thoughts on the design in [1], I found a strange
behaviour with the $subject. The use case is shown below as a sequence
of steps that need to be run on publisher and subscriber to arrive at
the strange behaviour.  In step 5, the table is dropped from the
publication and in step 6, the refresh publication is run on the
subscriber, from here onwards, the expectation is that no further
inserts into the publisher table have to be replicated on to the
subscriber, but the opposite happens i.e. the inserts are still
replicated to the subscriber. ISTM as a bug. Let me know if I'm
missing anything.


Did you try to investigate what's going on? Can you please check what
is the behavior if, after step-5, you restart the subscriber and
separately try creating a new subscription (maybe on a different
server) for that publication after step-5 and see if that allows the
relation to be replicated? AFAIU, in AlterSubscription_refresh, we
remove such dropped rels and stop their corresponding apply workers
which should stop the further replication of such relations but that
doesn't seem to be happening in your case.

Here's my analysis:
1) in the publisher, alter publication drop table successfully
removes(PublicationDropTables) the table from the catalogue
pg_publication_rel
2) in the subscriber, alter subscription refresh publication
successfully removes the table from the catalogue pg_subscription_rel
(AlterSubscription_refresh->RemoveSubscriptionRel)
so far so good


Here, it should register the worker to stop on commit, and then on
commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
Once the apply worker is stopped, the corresponding WALSender will
also be stopped. Something here is not happening as per expected
behavior.

3) after the insertion into the table in the publisher(remember that
it's dropped from the publication in (1)), the walsender process is
unable detect that the table has been dropped from the publication
i.e. it doesn't look at the pg_publication_rel catalogue or some
other, but it only does is_publishable_relation() check which returns
true in pgoutput_change(). Maybe the walsender should look at the
catalogue pg_publication_rel in is_publishable_relation()?


We must be somewhere checking pg_publication_rel before sending the
decoded change because otherwise, we would have sent the changes for
the table which are not even part of this publication. I think you can
try to create a separate table that is not part of the publication
under test and see how the changes for that are filtered.

I find that pgoutput_change() use a hash table RelationSyncCache to
cache the publication info for tables.  When we drop tables from the
publication, the RelationSyncCache doesn't updated, so it replicate
records.


IIUC the logical replication only replicate the tables in publication, I think
when the tables that aren't in publication should not be replicated.

Attached the patch that fixes it.  Thought?

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.






alter-publication-drop-table.patch
Description: alter-publication-drop-table.patch


Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-12 Thread japin


On Tue, 12 Jan 2021 at 14:38, Amit Kapila wrote:
> On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy
>  wrote:
>>
>> On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila  wrote:
>> >
>> > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
>> >  wrote:
>> > >
>> > > Hi,
>> > >
>> > > While providing thoughts on the design in [1], I found a strange
>> > > behaviour with the $subject. The use case is shown below as a sequence
>> > > of steps that need to be run on publisher and subscriber to arrive at
>> > > the strange behaviour.  In step 5, the table is dropped from the
>> > > publication and in step 6, the refresh publication is run on the
>> > > subscriber, from here onwards, the expectation is that no further
>> > > inserts into the publisher table have to be replicated on to the
>> > > subscriber, but the opposite happens i.e. the inserts are still
>> > > replicated to the subscriber. ISTM as a bug. Let me know if I'm
>> > > missing anything.
>> > >
>> >
>> > Did you try to investigate what's going on? Can you please check what
>> > is the behavior if, after step-5, you restart the subscriber and
>> > separately try creating a new subscription (maybe on a different
>> > server) for that publication after step-5 and see if that allows the
>> > relation to be replicated? AFAIU, in AlterSubscription_refresh, we
>> > remove such dropped rels and stop their corresponding apply workers
>> > which should stop the further replication of such relations but that
>> > doesn't seem to be happening in your case.
>>
>> Here's my analysis:
>> 1) in the publisher, alter publication drop table successfully
>> removes(PublicationDropTables) the table from the catalogue
>> pg_publication_rel
>> 2) in the subscriber, alter subscription refresh publication
>> successfully removes the table from the catalogue pg_subscription_rel
>> (AlterSubscription_refresh->RemoveSubscriptionRel)
>> so far so good
>>
>
> Here, it should register the worker to stop on commit, and then on
> commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
> Once the apply worker is stopped, the corresponding WALSender will
> also be stopped. Something here is not happening as per expected
> behavior.
>
>> 3) after the insertion into the table in the publisher(remember that
>> it's dropped from the publication in (1)), the walsender process is
>> unable detect that the table has been dropped from the publication
>> i.e. it doesn't look at the pg_publication_rel catalogue or some
>> other, but it only does is_publishable_relation() check which returns
>> true in pgoutput_change(). Maybe the walsender should look at the
>> catalogue pg_publication_rel in is_publishable_relation()?
>>
>
> We must be somewhere checking pg_publication_rel before sending the
> decoded change because otherwise, we would have sent the changes for
> the table which are not even part of this publication. I think you can
> try to create a separate table that is not part of the publication
> under test and see how the changes for that are filtered.

I find that pgoutput_change() use a hash table RelationSyncCache to
cache the publication info for tables.  When we drop tables from the
publication, the RelationSyncCache doesn't updated, so it replicate
records.

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-11 Thread Amit Kapila
On Tue, Jan 12, 2021 at 11:39 AM Bharath Rupireddy
 wrote:
>
> On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila  wrote:
> >
> > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
> >  wrote:
> > >
> > > Hi,
> > >
> > > While providing thoughts on the design in [1], I found a strange
> > > behaviour with the $subject. The use case is shown below as a sequence
> > > of steps that need to be run on publisher and subscriber to arrive at
> > > the strange behaviour.  In step 5, the table is dropped from the
> > > publication and in step 6, the refresh publication is run on the
> > > subscriber, from here onwards, the expectation is that no further
> > > inserts into the publisher table have to be replicated on to the
> > > subscriber, but the opposite happens i.e. the inserts are still
> > > replicated to the subscriber. ISTM as a bug. Let me know if I'm
> > > missing anything.
> > >
> >
> > Did you try to investigate what's going on? Can you please check what
> > is the behavior if, after step-5, you restart the subscriber and
> > separately try creating a new subscription (maybe on a different
> > server) for that publication after step-5 and see if that allows the
> > relation to be replicated? AFAIU, in AlterSubscription_refresh, we
> > remove such dropped rels and stop their corresponding apply workers
> > which should stop the further replication of such relations but that
> > doesn't seem to be happening in your case.
>
> Here's my analysis:
> 1) in the publisher, alter publication drop table successfully
> removes(PublicationDropTables) the table from the catalogue
> pg_publication_rel
> 2) in the subscriber, alter subscription refresh publication
> successfully removes the table from the catalogue pg_subscription_rel
> (AlterSubscription_refresh->RemoveSubscriptionRel)
> so far so good
>

Here, it should register the worker to stop on commit, and then on
commit it should call AtEOXact_ApplyLauncher to stop the apply worker.
Once the apply worker is stopped, the corresponding WALSender will
also be stopped. Something here is not happening as per expected
behavior.

> 3) after the insertion into the table in the publisher(remember that
> it's dropped from the publication in (1)), the walsender process is
> unable detect that the table has been dropped from the publication
> i.e. it doesn't look at the pg_publication_rel catalogue or some
> other, but it only does is_publishable_relation() check which returns
> true in pgoutput_change(). Maybe the walsender should look at the
> catalogue pg_publication_rel in is_publishable_relation()?
>

We must be somewhere checking pg_publication_rel before sending the
decoded change because otherwise, we would have sent the changes for
the table which are not even part of this publication. I think you can
try to create a separate table that is not part of the publication
under test and see how the changes for that are filtered.

-- 
With Regards,
Amit Kapila.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-11 Thread Bharath Rupireddy
On Tue, Jan 12, 2021 at 9:05 AM Amit Kapila  wrote:
>
> On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
>  wrote:
> >
> > Hi,
> >
> > While providing thoughts on the design in [1], I found a strange
> > behaviour with the $subject. The use case is shown below as a sequence
> > of steps that need to be run on publisher and subscriber to arrive at
> > the strange behaviour.  In step 5, the table is dropped from the
> > publication and in step 6, the refresh publication is run on the
> > subscriber, from here onwards, the expectation is that no further
> > inserts into the publisher table have to be replicated on to the
> > subscriber, but the opposite happens i.e. the inserts are still
> > replicated to the subscriber. ISTM as a bug. Let me know if I'm
> > missing anything.
> >
>
> Did you try to investigate what's going on? Can you please check what
> is the behavior if, after step-5, you restart the subscriber and
> separately try creating a new subscription (maybe on a different
> server) for that publication after step-5 and see if that allows the
> relation to be replicated? AFAIU, in AlterSubscription_refresh, we
> remove such dropped rels and stop their corresponding apply workers
> which should stop the further replication of such relations but that
> doesn't seem to be happening in your case.

Here's my analysis:
1) in the publisher, alter publication drop table successfully
removes(PublicationDropTables) the table from the catalogue
pg_publication_rel
2) in the subscriber, alter subscription refresh publication
successfully removes the table from the catalogue pg_subscription_rel
(AlterSubscription_refresh->RemoveSubscriptionRel)
so far so good
3) after the insertion into the table in the publisher(remember that
it's dropped from the publication in (1)), the walsender process is
unable detect that the table has been dropped from the publication
i.e. it doesn't look at the pg_publication_rel catalogue or some
other, but it only does is_publishable_relation() check which returns
true in pgoutput_change(). Maybe the walsender should look at the
catalogue pg_publication_rel in is_publishable_relation()?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-11 Thread japin


On Tue, 12 Jan 2021 at 13:39, Amit Kapila wrote:
> On Tue, Jan 12, 2021 at 9:58 AM japin  wrote:
>>
>>
>> On Tue, 12 Jan 2021 at 11:37, Amit Kapila wrote:
>> > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
>> >  wrote:
>> >>
>> >> Hi,
>> >>
>> >> While providing thoughts on the design in [1], I found a strange
>> >> behaviour with the $subject. The use case is shown below as a sequence
>> >> of steps that need to be run on publisher and subscriber to arrive at
>> >> the strange behaviour.  In step 5, the table is dropped from the
>> >> publication and in step 6, the refresh publication is run on the
>> >> subscriber, from here onwards, the expectation is that no further
>> >> inserts into the publisher table have to be replicated on to the
>> >> subscriber, but the opposite happens i.e. the inserts are still
>> >> replicated to the subscriber. ISTM as a bug. Let me know if I'm
>> >> missing anything.
>> >>
>> >
>> > Did you try to investigate what's going on? Can you please check what
>> > is the behavior if, after step-5, you restart the subscriber and
>> > separately try creating a new subscription (maybe on a different
>> > server) for that publication after step-5 and see if that allows the
>> > relation to be replicated? AFAIU, in AlterSubscription_refresh, we
>> > remove such dropped rels and stop their corresponding apply workers
>> > which should stop the further replication of such relations but that
>> > doesn't seem to be happening in your case.
>>
>> If we restart the subscriber after step-5, it will not replicate the records.
>>
>> As I said in [1], if we don't insert a new data in step-5, it will not
>> replicate the records.
>>
>
> Hmm, but in Bharath's test, it is replicating the Inserts in step-7
> and step-9 as well. Are you seeing something different?
>

Yes, however if we don't Inserts in step-5, the Inserts in step-7 and
step-9 will not replicate.


-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-11 Thread Amit Kapila
On Tue, Jan 12, 2021 at 9:58 AM japin  wrote:
>
>
> On Tue, 12 Jan 2021 at 11:37, Amit Kapila wrote:
> > On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
> >  wrote:
> >>
> >> Hi,
> >>
> >> While providing thoughts on the design in [1], I found a strange
> >> behaviour with the $subject. The use case is shown below as a sequence
> >> of steps that need to be run on publisher and subscriber to arrive at
> >> the strange behaviour.  In step 5, the table is dropped from the
> >> publication and in step 6, the refresh publication is run on the
> >> subscriber, from here onwards, the expectation is that no further
> >> inserts into the publisher table have to be replicated on to the
> >> subscriber, but the opposite happens i.e. the inserts are still
> >> replicated to the subscriber. ISTM as a bug. Let me know if I'm
> >> missing anything.
> >>
> >
> > Did you try to investigate what's going on? Can you please check what
> > is the behavior if, after step-5, you restart the subscriber and
> > separately try creating a new subscription (maybe on a different
> > server) for that publication after step-5 and see if that allows the
> > relation to be replicated? AFAIU, in AlterSubscription_refresh, we
> > remove such dropped rels and stop their corresponding apply workers
> > which should stop the further replication of such relations but that
> > doesn't seem to be happening in your case.
>
> If we restart the subscriber after step-5, it will not replicate the records.
>
> As I said in [1], if we don't insert a new data in step-5, it will not
> replicate the records.
>

Hmm, but in Bharath's test, it is replicating the Inserts in step-7
and step-9 as well. Are you seeing something different?

> In both cases, the AlterSubscription_refresh() call RemoveSubscriptionRel()
> and logicalrep_worker_stop_at_commit().  However, if we insert a data in
> step-5, it doesn't work as expected.  Any thoughts?
>

I think the data inserted in step-5 might be visible because we have
stopped the apply process after that but it is not clear why the data
inserted in steps 7 and 9 is getting replicated. I think due to some
reason apply worker is not getting stopped even after Refresh
Publication statement is finished due to which the data is being
replicated after that as well and after restart the apply worker won't
be restarted so the data replication doesn't happen.

-- 
With Regards,
Amit Kapila.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-11 Thread japin


On Tue, 12 Jan 2021 at 11:37, Amit Kapila wrote:
> On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
>  wrote:
>>
>> Hi,
>>
>> While providing thoughts on the design in [1], I found a strange
>> behaviour with the $subject. The use case is shown below as a sequence
>> of steps that need to be run on publisher and subscriber to arrive at
>> the strange behaviour.  In step 5, the table is dropped from the
>> publication and in step 6, the refresh publication is run on the
>> subscriber, from here onwards, the expectation is that no further
>> inserts into the publisher table have to be replicated on to the
>> subscriber, but the opposite happens i.e. the inserts are still
>> replicated to the subscriber. ISTM as a bug. Let me know if I'm
>> missing anything.
>>
>
> Did you try to investigate what's going on? Can you please check what
> is the behavior if, after step-5, you restart the subscriber and
> separately try creating a new subscription (maybe on a different
> server) for that publication after step-5 and see if that allows the
> relation to be replicated? AFAIU, in AlterSubscription_refresh, we
> remove such dropped rels and stop their corresponding apply workers
> which should stop the further replication of such relations but that
> doesn't seem to be happening in your case.

If we restart the subscriber after step-5, it will not replicate the records.

As I said in [1], if we don't insert a new data in step-5, it will not
replicate the records.

In both cases, the AlterSubscription_refresh() call RemoveSubscriptionRel()
and logicalrep_worker_stop_at_commit().  However, if we insert a data in
step-5, it doesn't work as expected.  Any thoughts?

[1] 
https://www.postgresql.org/message-id/a7a618fb-f87c-439c-90a3-93cf9e734...@hotmail.com

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.




Re: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION

2021-01-11 Thread Amit Kapila
On Mon, Jan 11, 2021 at 6:51 PM Bharath Rupireddy
 wrote:
>
> Hi,
>
> While providing thoughts on the design in [1], I found a strange
> behaviour with the $subject. The use case is shown below as a sequence
> of steps that need to be run on publisher and subscriber to arrive at
> the strange behaviour.  In step 5, the table is dropped from the
> publication and in step 6, the refresh publication is run on the
> subscriber, from here onwards, the expectation is that no further
> inserts into the publisher table have to be replicated on to the
> subscriber, but the opposite happens i.e. the inserts are still
> replicated to the subscriber. ISTM as a bug. Let me know if I'm
> missing anything.
>

Did you try to investigate what's going on? Can you please check what
is the behavior if, after step-5, you restart the subscriber and
separately try creating a new subscription (maybe on a different
server) for that publication after step-5 and see if that allows the
relation to be replicated? AFAIU, in AlterSubscription_refresh, we
remove such dropped rels and stop their corresponding apply workers
which should stop the further replication of such relations but that
doesn't seem to be happening in your case.

-- 
With Regards,
Amit Kapila.