Re: Handle infinite recursion in logical replication setup

2023-08-16 Thread Peter Eisentraut

On 09.08.23 04:50, Peter Smith wrote:

On Tue, Aug 8, 2023 at 6:52 PM Amit Kapila  wrote:


On Tue, Aug 8, 2023 at 1:50 PM Peter Eisentraut  wrote:


This patch added the following error message:

errdetail_plural("Subscribed publication %s is subscribing to other
publications.",
"Subscribed publications %s are subscribing to other publications.",
list_length(publist), pubnames->data),

But in PostgreSQL, a publication cannot subscribe to a publication, so
this is not giving accurate information.  Apparently, what it is trying
to say is that

The subscription that you are creating subscribes to publications that
contain tables that are written to by other subscriptions.

Can we get to a more accurate wording like this?



+1 for changing the message as per your suggestion.



PSA a patch to change this message text. The message now has wording
similar to the suggestion.


committed, thanks





Re: Handle infinite recursion in logical replication setup

2023-08-08 Thread Amit Kapila
On Wed, Aug 9, 2023 at 10:59 AM Peter Smith  wrote:
> >
> Example output using the patch look like this:
>
> SINGLULAR
> test_sub=# create subscription sub_test connection 'dbname=test_pub'
> publication pub_all_at_pub with(origin=NONE);
> WARNING:  subscription "sub_test" requested copy_data with origin =
> NONE but might copy data that had a different origin
> DETAIL:  The subscription that you are creating has a publication
> ("pub_all_at_pub") containing tables written to by other
> subscriptions.
> HINT:  Verify that initial data copied from the publisher tables did
> not come from other origins.
> NOTICE:  created replication slot "sub_test" on publisher
> CREATE SUBSCRIPTION
>
> PLURAL
> test_sub=# create subscription sub_test3 connection 'dbname=test_pub'
> publication pub_all_at_pub,pub_s1_at_pub,pub_s2_at_pub
> with(origin=NONE);
> WARNING:  subscription "sub_test3" requested copy_data with origin =
> NONE but might copy data that had a different origin
> DETAIL:  The subscription that you are creating has publications
> ("pub_s1_at_pub", "pub_all_at_pub") containing tables written to by
> other subscriptions.
> HINT:  Verify that initial data copied from the publisher tables did
> not come from other origins.
> NOTICE:  created replication slot "sub_test3" on publisher
> CREATE SUBSCRIPTION
>
> ~
>
> I thought above looked fine but if PeterE also does not like the ()
> then the message can be rearranged slightly like below to put the
> offending publications at the end of the message, like this:
>

Okay, I didn't know strings were already quoted but not sure if Round
Brackets () exactly will address the translatability concern.

> DETAIL:  The subscription that you are creating has the following
> publications containing tables written to by other subscriptions:
> "pub_s1_at_pub", "pub_all_at_pub"
>

Fair enough. Peter E., do let us know what you think makes sense here?

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2023-08-08 Thread Peter Smith
On Wed, Aug 9, 2023 at 2:44 PM Amit Kapila  wrote:
>
> On Wed, Aug 9, 2023 at 8:20 AM Peter Smith  wrote:
> >
> > On Tue, Aug 8, 2023 at 6:52 PM Amit Kapila  wrote:
> > >
> > > On Tue, Aug 8, 2023 at 1:50 PM Peter Eisentraut  
> > > wrote:
> > > >
> > > > This patch added the following error message:
> > > >
> > > > errdetail_plural("Subscribed publication %s is subscribing to other
> > > > publications.",
> > > > "Subscribed publications %s are subscribing to other publications.",
> > > > list_length(publist), pubnames->data),
> > > >
> > > > But in PostgreSQL, a publication cannot subscribe to a publication, so
> > > > this is not giving accurate information.  Apparently, what it is trying
> > > > to say is that
> > > >
> > > > The subscription that you are creating subscribes to publications that
> > > > contain tables that are written to by other subscriptions.
> > > >
> > > > Can we get to a more accurate wording like this?
> > > >
> > >
> > > +1 for changing the message as per your suggestion.
> > >
> >
> > PSA a patch to change this message text. The message now has wording
> > similar to the suggestion.
> >
> > > > There is also a translatability issue there, in the way the publication
> > > > list is pasted into the message.
> > > >
> >
> > The name/list substitution is now done within parentheses, which AFAIK
> > will be enough to eliminate any translation ambiguities.
> >
>
> A similar instance as below uses \"%s\" instead. So, isn't using the
> same a better idea?
> errdetail_plural("LDAP search for filter \"%s\" on server \"%s\"
> returned %d entry.",
>   "LDAP search for filter \"%s\" on server \"%s\" returned %d entries.",
>
>

Hmm. I don't see any similarity -- there the plural is for the %d, not
for a list of string items. And in our case the publication name (or
list of names) are already quoted by the function returning that list,
so quoting them again doesn't really make sense.

Example output using the patch look like this:

SINGLULAR
test_sub=# create subscription sub_test connection 'dbname=test_pub'
publication pub_all_at_pub with(origin=NONE);
WARNING:  subscription "sub_test" requested copy_data with origin =
NONE but might copy data that had a different origin
DETAIL:  The subscription that you are creating has a publication
("pub_all_at_pub") containing tables written to by other
subscriptions.
HINT:  Verify that initial data copied from the publisher tables did
not come from other origins.
NOTICE:  created replication slot "sub_test" on publisher
CREATE SUBSCRIPTION

PLURAL
test_sub=# create subscription sub_test3 connection 'dbname=test_pub'
publication pub_all_at_pub,pub_s1_at_pub,pub_s2_at_pub
with(origin=NONE);
WARNING:  subscription "sub_test3" requested copy_data with origin =
NONE but might copy data that had a different origin
DETAIL:  The subscription that you are creating has publications
("pub_s1_at_pub", "pub_all_at_pub") containing tables written to by
other subscriptions.
HINT:  Verify that initial data copied from the publisher tables did
not come from other origins.
NOTICE:  created replication slot "sub_test3" on publisher
CREATE SUBSCRIPTION

~

I thought above looked fine but if PeterE also does not like the ()
then the message can be rearranged slightly like below to put the
offending publications at the end of the message, like this:

DETAIL:  The subscription that you are creating has the following
publications containing tables written to by other subscriptions:
"pub_s1_at_pub", "pub_all_at_pub"

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Handle infinite recursion in logical replication setup

2023-08-08 Thread Amit Kapila
On Wed, Aug 9, 2023 at 8:20 AM Peter Smith  wrote:
>
> On Tue, Aug 8, 2023 at 6:52 PM Amit Kapila  wrote:
> >
> > On Tue, Aug 8, 2023 at 1:50 PM Peter Eisentraut  
> > wrote:
> > >
> > > This patch added the following error message:
> > >
> > > errdetail_plural("Subscribed publication %s is subscribing to other
> > > publications.",
> > > "Subscribed publications %s are subscribing to other publications.",
> > > list_length(publist), pubnames->data),
> > >
> > > But in PostgreSQL, a publication cannot subscribe to a publication, so
> > > this is not giving accurate information.  Apparently, what it is trying
> > > to say is that
> > >
> > > The subscription that you are creating subscribes to publications that
> > > contain tables that are written to by other subscriptions.
> > >
> > > Can we get to a more accurate wording like this?
> > >
> >
> > +1 for changing the message as per your suggestion.
> >
>
> PSA a patch to change this message text. The message now has wording
> similar to the suggestion.
>
> > > There is also a translatability issue there, in the way the publication
> > > list is pasted into the message.
> > >
>
> The name/list substitution is now done within parentheses, which AFAIK
> will be enough to eliminate any translation ambiguities.
>

A similar instance as below uses \"%s\" instead. So, isn't using the
same a better idea?
errdetail_plural("LDAP search for filter \"%s\" on server \"%s\"
returned %d entry.",
  "LDAP search for filter \"%s\" on server \"%s\" returned %d entries.",


-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2023-08-08 Thread Peter Smith
On Tue, Aug 8, 2023 at 6:52 PM Amit Kapila  wrote:
>
> On Tue, Aug 8, 2023 at 1:50 PM Peter Eisentraut  wrote:
> >
> > This patch added the following error message:
> >
> > errdetail_plural("Subscribed publication %s is subscribing to other
> > publications.",
> > "Subscribed publications %s are subscribing to other publications.",
> > list_length(publist), pubnames->data),
> >
> > But in PostgreSQL, a publication cannot subscribe to a publication, so
> > this is not giving accurate information.  Apparently, what it is trying
> > to say is that
> >
> > The subscription that you are creating subscribes to publications that
> > contain tables that are written to by other subscriptions.
> >
> > Can we get to a more accurate wording like this?
> >
>
> +1 for changing the message as per your suggestion.
>

PSA a patch to change this message text. The message now has wording
similar to the suggestion.

> > There is also a translatability issue there, in the way the publication
> > list is pasted into the message.
> >

The name/list substitution is now done within parentheses, which AFAIK
will be enough to eliminate any translation ambiguities.

> > Is the list of affected publications really that interesting?  I wonder
> > whether the list of affected tables might be more relevant?
> >
>
> In that case, we need to specify both schema name and table name in
> that case. I guess the list could be very long and not sure what to do
> for schema publications ( Create Publication ... For Schema).

Right, IIUC that was the reason for not choosing to show the tables in
the original patch -- i.e. the list might easily be very long with
100s or 1000s of tables it, and so inappropriate to substitute in the
message. OTOH, showing only problematic publications is a short list
but it is still more useful than showing nothing (e.g. there other
publications of the subscription might be OK so those ones are not
listed)

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Improved-message-for-create-subscription.patch
Description: Binary data


Re: Handle infinite recursion in logical replication setup

2023-08-08 Thread Amit Kapila
On Tue, Aug 8, 2023 at 1:50 PM Peter Eisentraut  wrote:
>
> On 12.09.22 07:23, vignesh C wrote:
> > On Fri, 9 Sept 2022 at 11:12, Amit Kapila  wrote:
> >>
> >> On Thu, Sep 8, 2022 at 9:32 AM vignesh C  wrote:
> >>>
> >>>
> >>> The attached patch has the changes to handle the same.
> >>>
> >>
> >> Pushed. I am not completely sure whether we want the remaining
> >> documentation patch in this thread in its current form or by modifying
> >> it. Johnathan has shown some interest in it. I feel you can start a
> >> separate thread for it to see if there is any interest in the same and
> >> close the CF entry for this work.
> >
> > Thanks for pushing the patch. I have closed this entry in commitfest.
> > I will wait for some more time and see the response regarding the
> > documentation patch and then start a new thread if required.
>
> This patch added the following error message:
>
> errdetail_plural("Subscribed publication %s is subscribing to other
> publications.",
> "Subscribed publications %s are subscribing to other publications.",
> list_length(publist), pubnames->data),
>
> But in PostgreSQL, a publication cannot subscribe to a publication, so
> this is not giving accurate information.  Apparently, what it is trying
> to say is that
>
> The subscription that you are creating subscribes to publications that
> contain tables that are written to by other subscriptions.
>
> Can we get to a more accurate wording like this?
>

+1 for changing the message as per your suggestion.

> There is also a translatability issue there, in the way the publication
> list is pasted into the message.
>
> Is the list of affected publications really that interesting?  I wonder
> whether the list of affected tables might be more relevant?
>

In that case, we need to specify both schema name and table name in
that case. I guess the list could be very long and not sure what to do
for schema publications ( Create Publication ... For Schema).

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2023-08-08 Thread Peter Eisentraut

On 12.09.22 07:23, vignesh C wrote:

On Fri, 9 Sept 2022 at 11:12, Amit Kapila  wrote:


On Thu, Sep 8, 2022 at 9:32 AM vignesh C  wrote:



The attached patch has the changes to handle the same.



Pushed. I am not completely sure whether we want the remaining
documentation patch in this thread in its current form or by modifying
it. Johnathan has shown some interest in it. I feel you can start a
separate thread for it to see if there is any interest in the same and
close the CF entry for this work.


Thanks for pushing the patch. I have closed this entry in commitfest.
I will wait for some more time and see the response regarding the
documentation patch and then start a new thread if required.


This patch added the following error message:

errdetail_plural("Subscribed publication %s is subscribing to other 
publications.",

"Subscribed publications %s are subscribing to other publications.",
list_length(publist), pubnames->data),

But in PostgreSQL, a publication cannot subscribe to a publication, so 
this is not giving accurate information.  Apparently, what it is trying 
to say is that


The subscription that you are creating subscribes to publications that 
contain tables that are written to by other subscriptions.


Can we get to a more accurate wording like this?

There is also a translatability issue there, in the way the publication 
list is pasted into the message.


Is the list of affected publications really that interesting?  I wonder 
whether the list of affected tables might be more relevant?






Re: Handle infinite recursion in logical replication setup

2023-01-10 Thread Amit Kapila
On Tue, Jan 10, 2023 at 11:24 PM Jonathan S. Katz  wrote:
>
> On 1/10/23 10:17 AM, Amit Kapila wrote:
> > On Tue, Jan 10, 2023 at 8:13 AM Jonathan S. Katz  
> > wrote:
>
> > One can use local or higher
> > for reducing the latency for COMMIT when synchronous replication is
> > used in the publisher. Won't using 'local' while creating subscription
> > would suffice the need to consistently replicate the data? I mean it
> > is equivalent to somebody using levels greater than local in case of
> > physical replication. I think in the case of physical replication, we
> > won't wait for standby to replicate to another node before sending a
> > response, so why to wait in the case of logical replication? If this
> > understanding is correct, then probably it is sufficient to support
> > 'local' for a subscription.
>
> I think this is a good explanation. We should incorporate some version
> of this into the docs, and add some clarity around the use of
> `synchronous_commit` option in `CREATE SUBSCRIPTION` in particular with
> the origin feature. I can make an attempt at this.
>

Okay, thanks!

> Perhaps another question: based on this, should we disallow setting
> values of `synchronous_commit` greater than `local` when using
> "origin=none"?
>

I think it should be done irrespective of the value of origin because
it can create a deadlock in those cases as well. I think one idea as
you suggest is to block levels higher than local and the other is to
make it behave like 'local' even if the level is higher than 'local'
which would be somewhat similar to our physical replication.

> That might be too strict, but maybe we should warn around
> the risk of deadlock either in the logs or in the docs.
>

Yeah, we can mention it in docs as well. We can probably document as
part of the docs work for this thread but it would be better to start
a separate thread to fix this behavior by either of the above two
approaches.

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2023-01-10 Thread Jonathan S. Katz

On 1/10/23 10:17 AM, Amit Kapila wrote:

On Tue, Jan 10, 2023 at 8:13 AM Jonathan S. Katz  wrote:



This consistently created the deadlock in my testing.

Discussing with Masahiko off-list, this is due to a deadlock from 4
processes: the walsenders on A and B, and the apply workers on A and B.
The walsenders are waiting for feedback from the apply workers, and the
apply workers are waiting for the walsenders to synchronize (I may be
oversimplifying).

He suggested I try the above example instead with `synchronous_commit`
set to `local`. In this case, I verified that there is no more deadlock,
but he informed me that we would not be able to use cascading
synchronous replication when "origin=none".



This has nothing to do with the origin feature. I mean this should
happen with origin=any or even in PG15 without using origin at all.
Am, I missing something? One related point to note is that in physical
replication cascading replication is asynchronous. See docs [1]
(Cascading replication is currently asynchronous)


This is not directly related to the origin feature, but the origin 
feature makes it apparent. There is a common use-case where people want 
to replicate data synchronously between two tables, and this is enabled 
by the "origin" feature.


To be clear, my bigger concern is that it's fairly easy for users to 
create a deadlock situation based on how they set "synchronous_commit" 
with the origin feature -- this is the main reason why I brought it up. 
I'm less concerned about the "cascading" case, though I want to try out 
sync rep between 3 instances to see what happens.



If we decide that this is a documentation issue, I'd suggest we improve
the guidance around using `synchronous_commit`[1] on the CREATE
SUBSCRIPTION page, as the GUC page[2] warns against using `local`:



Yeah, but on Create Subscription page, we have mentioned that it is
safe to use off for logical replication.


While I think you can infer that it's "safe" for synchronous 
replication, I don't think it's clear.


We say it's "safe to use `off` for logical replication", but provide a 
lengthy explanation around synchronous logical replication that 
concludes that it's "advantageous to set synchronous_commit to local or 
higher" but does not address safety. The first line in the explanation 
of the parameter links to the `synchronous_commit` GUC which 
specifically advises against using "local" for synchronous replication.


Additionally, because we say "local" or higher, we increase the risk of 
the aforementioned in HEAD with the origin feature.


I know I'm hammering on this point a bit, but it feels like this is 
relatively easy to misconfigure with the upcoming "origin" change (I did 
so myself from reading the devel docs) and we should ensure we guide our 
users appropriately.



One can use local or higher
for reducing the latency for COMMIT when synchronous replication is
used in the publisher. Won't using 'local' while creating subscription
would suffice the need to consistently replicate the data? I mean it
is equivalent to somebody using levels greater than local in case of
physical replication. I think in the case of physical replication, we
won't wait for standby to replicate to another node before sending a
response, so why to wait in the case of logical replication? If this
understanding is correct, then probably it is sufficient to support
'local' for a subscription.


I think this is a good explanation. We should incorporate some version 
of this into the docs, and add some clarity around the use of 
`synchronous_commit` option in `CREATE SUBSCRIPTION` in particular with 
the origin feature. I can make an attempt at this.


Perhaps another question: based on this, should we disallow setting 
values of `synchronous_commit` greater than `local` when using 
"origin=none"? That might be too strict, but maybe we should warn around 
the risk of deadlock either in the logs or in the docs.


Thanks,

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Re: Handle infinite recursion in logical replication setup

2023-01-10 Thread Amit Kapila
On Tue, Jan 10, 2023 at 8:13 AM Jonathan S. Katz  wrote:
>
> On 9/12/22 1:23 AM, vignesh C wrote:
> > On Fri, 9 Sept 2022 at 11:12, Amit Kapila  wrote:
> >
> > Thanks for pushing the patch. I have closed this entry in commitfest.
> > I will wait for some more time and see the response regarding the
> > documentation patch and then start a new thread if required.
>
> I've been testing this patch in advancing of working on the
> documentation and came across a behavior I wanted to note. Specifically,
> I am hitting a deadlock while trying to synchronous replicate between
> the two instances at any `synchronous_commit` level above `local`.
>
> Here is my set up. I have two instances, "A" and "B".
>
> On A and B, run:
>
>CREATE TABLE sync (id int PRIMARY KEY, info float);
>CREATE PUBLICATION sync FOR TABLE sync;
>
> On A, run:
>
>CREATE SUBSCRIPTION sync
>CONNECTION 'connstr-to-B'
>PUBLICATION sync
>WITH (
>  streaming=true, copy_data=false,
>  origin=none, synchronous_commit='on');
>
> On B, run:
>
>CREATE SUBSCRIPTION sync
>CONNECTION 'connstr-to-A'
>PUBLICATION sync
>WITH (
>  streaming=true, copy_data=false,
>  origin=none, synchronous_commit='on');
>
> On A and B, run:
>
>ALTER SYSTEM SET synchronous_standby_names TO 'sync';
>SELECT pg_reload_conf();
>
> Verify on A and B that pg_stat_replication.sync_state is set to "sync"
>
>SELECT application_name, sync_state = 'sync' AS is_sync
>FROM pg_stat_replication
>WHERE application_name = 'sync';
>
> The next to commands should be run simultaneously on A and B:
>
> -- run this on A
> INSERT INTO sync
> SELECT x, random() FROM generate_series(1,200, 2) x;
>
> -- run this on B
> INSERT INTO sync
> SELECT x, random() FROM generate_series(2,200, 2) x;
>
> This consistently created the deadlock in my testing.
>
> Discussing with Masahiko off-list, this is due to a deadlock from 4
> processes: the walsenders on A and B, and the apply workers on A and B.
> The walsenders are waiting for feedback from the apply workers, and the
> apply workers are waiting for the walsenders to synchronize (I may be
> oversimplifying).
>
> He suggested I try the above example instead with `synchronous_commit`
> set to `local`. In this case, I verified that there is no more deadlock,
> but he informed me that we would not be able to use cascading
> synchronous replication when "origin=none".
>

This has nothing to do with the origin feature. I mean this should
happen with origin=any or even in PG15 without using origin at all.
Am, I missing something? One related point to note is that in physical
replication cascading replication is asynchronous. See docs [1]
(Cascading replication is currently asynchronous)

> If we decide that this is a documentation issue, I'd suggest we improve
> the guidance around using `synchronous_commit`[1] on the CREATE
> SUBSCRIPTION page, as the GUC page[2] warns against using `local`:
>

Yeah, but on Create Subscription page, we have mentioned that it is
safe to use off for logical replication. One can use local or higher
for reducing the latency for COMMIT when synchronous replication is
used in the publisher. Won't using 'local' while creating subscription
would suffice the need to consistently replicate the data? I mean it
is equivalent to somebody using levels greater than local in case of
physical replication. I think in the case of physical replication, we
won't wait for standby to replicate to another node before sending a
response, so why to wait in the case of logical replication? If this
understanding is correct, then probably it is sufficient to support
'local' for a subscription.

[1] - https://www.postgresql.org/docs/devel/warm-standby.html

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2023-01-09 Thread Jonathan S. Katz

On 9/12/22 1:23 AM, vignesh C wrote:

On Fri, 9 Sept 2022 at 11:12, Amit Kapila  wrote:


On Thu, Sep 8, 2022 at 9:32 AM vignesh C  wrote:



The attached patch has the changes to handle the same.



Pushed. I am not completely sure whether we want the remaining
documentation patch in this thread in its current form or by modifying
it. Johnathan has shown some interest in it. I feel you can start a
separate thread for it to see if there is any interest in the same and
close the CF entry for this work.


Thanks for pushing the patch. I have closed this entry in commitfest.
I will wait for some more time and see the response regarding the
documentation patch and then start a new thread if required.


I've been testing this patch in advancing of working on the 
documentation and came across a behavior I wanted to note. Specifically, 
I am hitting a deadlock while trying to synchronous replicate between 
the two instances at any `synchronous_commit` level above `local`.


Here is my set up. I have two instances, "A" and "B".

On A and B, run:

  CREATE TABLE sync (id int PRIMARY KEY, info float);
  CREATE PUBLICATION sync FOR TABLE sync;

On A, run:

  CREATE SUBSCRIPTION sync
  CONNECTION 'connstr-to-B'
  PUBLICATION sync
  WITH (
streaming=true, copy_data=false,
origin=none, synchronous_commit='on');

On B, run:

  CREATE SUBSCRIPTION sync
  CONNECTION 'connstr-to-A'
  PUBLICATION sync
  WITH (
streaming=true, copy_data=false,
origin=none, synchronous_commit='on');

On A and B, run:

  ALTER SYSTEM SET synchronous_standby_names TO 'sync';
  SELECT pg_reload_conf();

Verify on A and B that pg_stat_replication.sync_state is set to "sync"

  SELECT application_name, sync_state = 'sync' AS is_sync
  FROM pg_stat_replication
  WHERE application_name = 'sync';

The next to commands should be run simultaneously on A and B:

-- run this on A
INSERT INTO sync
SELECT x, random() FROM generate_series(1,200, 2) x;

-- run this on B
INSERT INTO sync
SELECT x, random() FROM generate_series(2,200, 2) x;

This consistently created the deadlock in my testing.

Discussing with Masahiko off-list, this is due to a deadlock from 4 
processes: the walsenders on A and B, and the apply workers on A and B. 
The walsenders are waiting for feedback from the apply workers, and the 
apply workers are waiting for the walsenders to synchronize (I may be 
oversimplifying).


He suggested I try the above example instead with `synchronous_commit` 
set to `local`. In this case, I verified that there is no more deadlock, 
but he informed me that we would not be able to use cascading 
synchronous replication when "origin=none".


If we decide that this is a documentation issue, I'd suggest we improve 
the guidance around using `synchronous_commit`[1] on the CREATE 
SUBSCRIPTION page, as the GUC page[2] warns against using `local`:


"The setting local causes commits to wait for local flush to disk, but 
not for replication. This is usually not desirable when synchronous 
replication is in use, but is provided for completeness."


Thanks,

Jonathan

[1] https://www.postgresql.org/docs/devel/sql-createsubscription.html
[2] 
https://www.postgresql.org/docs/devel/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT


OpenPGP_signature
Description: OpenPGP digital signature


Re: Handle infinite recursion in logical replication setup

2022-09-11 Thread vignesh C
On Fri, 9 Sept 2022 at 11:12, Amit Kapila  wrote:
>
> On Thu, Sep 8, 2022 at 9:32 AM vignesh C  wrote:
> >
> >
> > The attached patch has the changes to handle the same.
> >
>
> Pushed. I am not completely sure whether we want the remaining
> documentation patch in this thread in its current form or by modifying
> it. Johnathan has shown some interest in it. I feel you can start a
> separate thread for it to see if there is any interest in the same and
> close the CF entry for this work.

Thanks for pushing the patch. I have closed this entry in commitfest.
I will wait for some more time and see the response regarding the
documentation patch and then start a new thread if required.

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-09-08 Thread Amit Kapila
On Thu, Sep 8, 2022 at 9:32 AM vignesh C  wrote:
>
>
> The attached patch has the changes to handle the same.
>

Pushed. I am not completely sure whether we want the remaining
documentation patch in this thread in its current form or by modifying
it. Johnathan has shown some interest in it. I feel you can start a
separate thread for it to see if there is any interest in the same and
close the CF entry for this work.

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-09-07 Thread vignesh C
On Thu, 8 Sept 2022 at 08:57, vignesh C  wrote:
>
> On Thu, 8 Sept 2022 at 08:23, Amit Kapila  wrote:
> >
> > On Thu, Sep 8, 2022 at 7:57 AM vignesh C  wrote:
> > >
> > > There is a buildfarm failure on mylodon at [1] because of the new test
> > > added. I will analyse and share the findings for the same.
> > >
> > > [1] - 
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2022-09-08%2001%3A40%3A27
> > >
> >
> > The log is as below:
> >
> > error running SQL: 'psql::1: ERROR:  could not drop relation
> > mapping for subscription "tap_sub_a_b_2"
> > DETAIL:  Table synchronization for relation "tab_new" is in progress
> > and is in state "s".
> > HINT:  Use ALTER SUBSCRIPTION ... ENABLE to enable subscription if not
> > already enabled or use DROP SUBSCRIPTION ... to drop the
> > subscription.'
> > while running 'psql -XAtq -d port=50352 host=/tmp/WMoRd6ngw2
> > dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'DROP TABLE
> > tab_new' at 
> > /mnt/resource/bf/build/mylodon/HEAD/pgsql.build/../pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm
> > line 1860.
> > ### Stopping node "node_A" using mode immediate
> >
> > This clearly indicates the problem. We can't drop the relation till it
> > is marked ready in pg_subscription_rel and prior to dropping, the test
> > just does $node_A->wait_for_subscription_sync($node_B, $subname_AB2);.
> > This doesn't ensure that relation is in 'ready' state as it will
> > finish even when the relation is in 'syncdone' state.
>
> I agree with the analysis, adding wait for ready state before dropping
> the table approach looks good to me. I will prepare a patch for the
> same and share it.

The attached patch has the changes to handle the same.

Regards,
Vignesh


0001-Fix-buildfarm-tap-test-failure.patch
Description: Binary data


Re: Handle infinite recursion in logical replication setup

2022-09-07 Thread vignesh C
On Thu, 8 Sept 2022 at 08:23, Amit Kapila  wrote:
>
> On Thu, Sep 8, 2022 at 7:57 AM vignesh C  wrote:
> >
> > There is a buildfarm failure on mylodon at [1] because of the new test
> > added. I will analyse and share the findings for the same.
> >
> > [1] - 
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2022-09-08%2001%3A40%3A27
> >
>
> The log is as below:
>
> error running SQL: 'psql::1: ERROR:  could not drop relation
> mapping for subscription "tap_sub_a_b_2"
> DETAIL:  Table synchronization for relation "tab_new" is in progress
> and is in state "s".
> HINT:  Use ALTER SUBSCRIPTION ... ENABLE to enable subscription if not
> already enabled or use DROP SUBSCRIPTION ... to drop the
> subscription.'
> while running 'psql -XAtq -d port=50352 host=/tmp/WMoRd6ngw2
> dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'DROP TABLE
> tab_new' at 
> /mnt/resource/bf/build/mylodon/HEAD/pgsql.build/../pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm
> line 1860.
> ### Stopping node "node_A" using mode immediate
>
> This clearly indicates the problem. We can't drop the relation till it
> is marked ready in pg_subscription_rel and prior to dropping, the test
> just does $node_A->wait_for_subscription_sync($node_B, $subname_AB2);.
> This doesn't ensure that relation is in 'ready' state as it will
> finish even when the relation is in 'syncdone' state.

I agree with the analysis, adding wait for ready state before dropping
the table approach looks good to me. I will prepare a patch for the
same and share it.

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-09-07 Thread Amit Kapila
On Thu, Sep 8, 2022 at 7:57 AM vignesh C  wrote:
>
> There is a buildfarm failure on mylodon at [1] because of the new test
> added. I will analyse and share the findings for the same.
>
> [1] - 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2022-09-08%2001%3A40%3A27
>

The log is as below:

error running SQL: 'psql::1: ERROR:  could not drop relation
mapping for subscription "tap_sub_a_b_2"
DETAIL:  Table synchronization for relation "tab_new" is in progress
and is in state "s".
HINT:  Use ALTER SUBSCRIPTION ... ENABLE to enable subscription if not
already enabled or use DROP SUBSCRIPTION ... to drop the
subscription.'
while running 'psql -XAtq -d port=50352 host=/tmp/WMoRd6ngw2
dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'DROP TABLE
tab_new' at 
/mnt/resource/bf/build/mylodon/HEAD/pgsql.build/../pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm
line 1860.
### Stopping node "node_A" using mode immediate

This clearly indicates the problem. We can't drop the relation till it
is marked ready in pg_subscription_rel and prior to dropping, the test
just does $node_A->wait_for_subscription_sync($node_B, $subname_AB2);.
This doesn't ensure that relation is in 'ready' state as it will
finish even when the relation is in 'syncdone' state.

So, I think if we want to drop the tables, we need to poll for 'ready'
state or otherwise, anyway, this is the end of all tests and nodes
won't be reused, so we can remove the clean-up in the end. Any other
ideas?

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-09-07 Thread vignesh C
Hi,

There is a buildfarm failure on mylodon at [1] because of the new test
added. I will analyse and share the findings for the same.

[1] - 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mylodon&dt=2022-09-08%2001%3A40%3A27

Regards,
Vignesh

On Wed, 7 Sept 2022 at 17:10, vignesh C  wrote:
>
> On Wed, 7 Sept 2022 at 14:34, Amit Kapila  wrote:
> >
> > On Wed, Sep 7, 2022 at 9:53 AM vignesh C  wrote:
> > >
> > > Thanks for the comments, the attached v47 patch has the changes for the 
> > > same.
> > >
> >
> > V47-0001* looks good to me apart from below minor things. I would like
> > to commit this tomorrow unless there are more comments on it.
> >
> > Few minor suggestions:
> > ==
> > 1.
> > + list_free_deep(publist);
> > + pfree(pubnames->data);
> > + pfree(pubnames);
> >
> > I don't think we need to free this memory as it will automatically be
> > released at end of the command (this memory is allocated in portal
> > context). I understand there is no harm in freeing it as well but
> > better to leave it as there doesn't appear to be any danger of leaking
> > memory for a longer time.
>
> Modified
>
> > 2.
> > + res = walrcv_exec(wrconn, cmd.data, 1, tableRow);
> > + pfree(cmd.data);
> >
> > Don't you need to free cmd as well here? I think it is better to avoid
> > freeing it due to reasons mentioned in the previous point.
>
> cmd need not be freed here as we use initStringInfo for initialization
> and initStringInfo allocates memory for data member only.
>
> The attached patch has the changes for the same.
>
> Regards,
> Vignesh




Re: Handle infinite recursion in logical replication setup

2022-09-07 Thread vignesh C
On Wed, 7 Sept 2022 at 14:34, Amit Kapila  wrote:
>
> On Wed, Sep 7, 2022 at 9:53 AM vignesh C  wrote:
> >
> > Thanks for the comments, the attached v47 patch has the changes for the 
> > same.
> >
>
> V47-0001* looks good to me apart from below minor things. I would like
> to commit this tomorrow unless there are more comments on it.
>
> Few minor suggestions:
> ==
> 1.
> + list_free_deep(publist);
> + pfree(pubnames->data);
> + pfree(pubnames);
>
> I don't think we need to free this memory as it will automatically be
> released at end of the command (this memory is allocated in portal
> context). I understand there is no harm in freeing it as well but
> better to leave it as there doesn't appear to be any danger of leaking
> memory for a longer time.

Modified

> 2.
> + res = walrcv_exec(wrconn, cmd.data, 1, tableRow);
> + pfree(cmd.data);
>
> Don't you need to free cmd as well here? I think it is better to avoid
> freeing it due to reasons mentioned in the previous point.

cmd need not be freed here as we use initStringInfo for initialization
and initStringInfo allocates memory for data member only.

The attached patch has the changes for the same.

Regards,
Vignesh


v48-0001-Raise-a-warning-if-there-is-a-possibility-of-dat.patch
Description: Binary data


v48-0002-Document-the-steps-for-replication-between-prima.patch
Description: Binary data


Re: Handle infinite recursion in logical replication setup

2022-09-07 Thread Amit Kapila
On Wed, Sep 7, 2022 at 9:53 AM vignesh C  wrote:
>
> Thanks for the comments, the attached v47 patch has the changes for the same.
>

V47-0001* looks good to me apart from below minor things. I would like
to commit this tomorrow unless there are more comments on it.

Few minor suggestions:
==
1.
+ list_free_deep(publist);
+ pfree(pubnames->data);
+ pfree(pubnames);

I don't think we need to free this memory as it will automatically be
released at end of the command (this memory is allocated in portal
context). I understand there is no harm in freeing it as well but
better to leave it as there doesn't appear to be any danger of leaking
memory for a longer time.

2.
+ res = walrcv_exec(wrconn, cmd.data, 1, tableRow);
+ pfree(cmd.data);

Don't you need to free cmd as well here? I think it is better to avoid
freeing it due to reasons mentioned in the previous point.

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-09-07 Thread Amit Kapila
On Wed, Sep 7, 2022 at 1:09 PM shiy.f...@fujitsu.com
 wrote:
>
> On Wed, Sep 7, 2022 12:23 PM vignesh C  wrote:
> >
> >
> > Thanks for the comments, the attached v47 patch has the changes for the
> > same.
> >
>
> Thanks for updating the patch.
>
> Here is a comment.
>
> +   for (i = 0; i < subrel_count; i++)
> +   {
> +   Oid relid = subrel_local_oids[i];
> +   char   *schemaname = 
> get_namespace_name(get_rel_namespace(relid));
> +   char   *tablename = get_rel_name(relid);
> +
> +   appendStringInfo(&cmd, "AND NOT (N.nspname = '%s' AND 
> C.relname = '%s')\n",
> +schemaname, tablename);
> +   }
>
> Would it be better to add "pfree(schemaname)" and "pfree(tablename)" after
> calling appendStringInfo()?
>

No, I don't think we need to do retail pfree of each and every
allocation as these allocations are made in the portal context which
will be freed by the command end.

-- 
With Regards,
Amit Kapila.




RE: Handle infinite recursion in logical replication setup

2022-09-07 Thread shiy.f...@fujitsu.com
On Wed, Sep 7, 2022 12:23 PM vignesh C  wrote:
> 
> 
> Thanks for the comments, the attached v47 patch has the changes for the
> same.
> 

Thanks for updating the patch.

Here is a comment.

+   for (i = 0; i < subrel_count; i++)
+   {
+   Oid relid = subrel_local_oids[i];
+   char   *schemaname = 
get_namespace_name(get_rel_namespace(relid));
+   char   *tablename = get_rel_name(relid);
+
+   appendStringInfo(&cmd, "AND NOT (N.nspname = '%s' AND C.relname 
= '%s')\n",
+schemaname, tablename);
+   }

Would it be better to add "pfree(schemaname)" and "pfree(tablename)" after
calling appendStringInfo()?

Regards,
Shi yu


Re: Handle infinite recursion in logical replication setup

2022-09-06 Thread vignesh C
On Tue, 6 Sept 2022 at 15:25, Amit Kapila  wrote:
>
> On Tue, Sep 6, 2022 at 9:31 AM wangw.f...@fujitsu.com
>  wrote:
> >
> > On Tues, 6 Sept 2022 at 11:14, vignesh C  wrote:
> > > Thanks for the comments, the attached patch has the changes for the same.
> >
> > Thanks for updating the patch.
> >
> > Here is one comment for 0001 patch.
> > 1. The query in function check_publications_origin.
> > +   appendStringInfoString(&cmd,
> > +  "SELECT DISTINCT 
> > P.pubname AS pubname\n"
> > +  "FROM pg_publication 
> > P,\n"
> > +  " LATERAL 
> > pg_get_publication_tables(P.pubname) GPT\n"
> > +  " LEFT JOIN 
> > pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
> > +  " pg_class C JOIN 
> > pg_namespace N ON (N.oid = C.relnamespace)\n"
> > +  "WHERE C.oid = GPT.relid 
> > AND PS.srrelid IS NOT NULL AND P.pubname IN (");
> >
> > Since I found that we only use "PS.srrelid" in the WHERE statement by
> > specifying "PS.srrelid IS NOT NULL", could we just use "[INNER] JOIN" to 
> > join
> > the table pg_subscription_rel?
> >
>
> I also think we can use INNER JOIN here but maybe there is a reason
> why that is not used in the first place. If we change it in the code
> then also let's change the same in the docs section as well.

Modified

> Few minor points:
> ===
> 1.
> This scenario is detected and a WARNING is logged to the
> +   user, but the warning is only an indication of a potential problem; it is
> +   the user responsibility to make the necessary checks to ensure the copied
> +   data origins are really as wanted or not.
> +  
>
> /user/user's

Modified

> 2. How about a commit message like:
> Raise a warning if there is a possibility of data from multiple origins.
>
> This commit raises a warning message for a combination of options
> ('copy_data = true' and 'origin = none') during CREATE/ALTER subscription
> operations if the publication tables were also replicated from other
> publishers.
>
> During replication, we can skip the data from other origins as we have that
> information in WAL but that is not possible during initial sync so we raise
> a warning if there is such a possibility.

Yes, this looks good. Modified

Thanks for the comment, the v47 patch attached at [1] has the changes
for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm33T%3D23P-GWvy3O7cT1BfHuGV8dosAw1AVLf40MPvg2bg%40mail.gmail.com

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-09-06 Thread vignesh C
On Tue, 6 Sept 2022 at 09:31, wangw.f...@fujitsu.com
 wrote:
>
> On Tues, 6 Sept 2022 at 11:14, vignesh C  wrote:
> > Thanks for the comments, the attached patch has the changes for the same.
>
> Thanks for updating the patch.
>
> Here is one comment for 0001 patch.
> 1. The query in function check_publications_origin.
> +   appendStringInfoString(&cmd,
> +  "SELECT DISTINCT P.pubname 
> AS pubname\n"
> +  "FROM pg_publication P,\n"
> +  " LATERAL 
> pg_get_publication_tables(P.pubname) GPT\n"
> +  " LEFT JOIN 
> pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
> +  " pg_class C JOIN 
> pg_namespace N ON (N.oid = C.relnamespace)\n"
> +  "WHERE C.oid = GPT.relid 
> AND PS.srrelid IS NOT NULL AND P.pubname IN (");
>
> Since I found that we only use "PS.srrelid" in the WHERE statement by
> specifying "PS.srrelid IS NOT NULL", could we just use "[INNER] JOIN" to join
> the table pg_subscription_rel?

Thanks for the comment, the v47 patch attached at [1] has the changes
for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm33T%3D23P-GWvy3O7cT1BfHuGV8dosAw1AVLf40MPvg2bg%40mail.gmail.com

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-09-06 Thread vignesh C
On Tue, 6 Sept 2022 at 09:31, shiy.f...@fujitsu.com
 wrote:
>
> On Tue, Sep 6, 2022 11:14 AM vignesh C  wrote:
> >
> > Thanks for the comments, the attached patch has the changes for the same.
> >
>
> Thanks for updating the patch. Here are some comments.
>
> 1.
> +   if (subrel_count)
> +   {
> +   /*
> +* In case of ALTER SUBSCRIPTION ... REFRESH, 
> subrel_local_oids
> +* contains the list of relation oids that are already 
> present on the
> +* subscriber. This check should be skipped for these tables.
> +*/
> +   appendStringInfo(&cmd, "AND C.oid NOT IN (\n"
> +"SELECT C.oid \n"
> +"FROM pg_class C\n"
> +" JOIN pg_namespace N ON 
> N.oid = C.relnamespace\n"
> +"WHERE ");
> +   get_skip_tables_str(subrel_local_oids, subrel_count, &cmd);
> +   appendStringInfo(&cmd, ")");
> +   }
>
> I think we can skip the tables without using a subquery. See the SQL below.
> Would it be better?
>
> SELECT DISTINCT P.pubname AS pubname
> FROM pg_publication P,
>  LATERAL pg_get_publication_tables(P.pubname) GPT
>  LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),
>  pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
> WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND P.pubname IN ('p1', 
> 'p3')
> AND NOT (N.nspname = 'public' AND C.relname = 'tbl')
> AND NOT (N.nspname = 'public' AND C.relname = 't1');

Modified

> 2.
> +   When using a subscription parameter combination of
> +   copy_data=true and origin=NONE, the
>
> Could we use "copy_data = true" and "origin = NONE" (add two spaces around the
> equals sign)? I think it looks clearer that way. And it is consistent with 
> other
> places in this file.

Modified

Thanks for the comments, the attached v47 patch has the changes for the same.

Regards,
Vignesh


v47-0001-Raise-a-warning-if-there-is-a-possibility-of-dat.patch
Description: Binary data


v47-0002-Document-the-steps-for-replication-between-prima.patch
Description: Binary data


Re: Handle infinite recursion in logical replication setup

2022-09-06 Thread Amit Kapila
On Tue, Sep 6, 2022 at 9:31 AM wangw.f...@fujitsu.com
 wrote:
>
> On Tues, 6 Sept 2022 at 11:14, vignesh C  wrote:
> > Thanks for the comments, the attached patch has the changes for the same.
>
> Thanks for updating the patch.
>
> Here is one comment for 0001 patch.
> 1. The query in function check_publications_origin.
> +   appendStringInfoString(&cmd,
> +  "SELECT DISTINCT P.pubname 
> AS pubname\n"
> +  "FROM pg_publication P,\n"
> +  " LATERAL 
> pg_get_publication_tables(P.pubname) GPT\n"
> +  " LEFT JOIN 
> pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
> +  " pg_class C JOIN 
> pg_namespace N ON (N.oid = C.relnamespace)\n"
> +  "WHERE C.oid = GPT.relid 
> AND PS.srrelid IS NOT NULL AND P.pubname IN (");
>
> Since I found that we only use "PS.srrelid" in the WHERE statement by
> specifying "PS.srrelid IS NOT NULL", could we just use "[INNER] JOIN" to join
> the table pg_subscription_rel?
>

I also think we can use INNER JOIN here but maybe there is a reason
why that is not used in the first place. If we change it in the code
then also let's change the same in the docs section as well.

Few minor points:
===
1.
This scenario is detected and a WARNING is logged to the
+   user, but the warning is only an indication of a potential problem; it is
+   the user responsibility to make the necessary checks to ensure the copied
+   data origins are really as wanted or not.
+  

/user/user's

2. How about a commit message like:
Raise a warning if there is a possibility of data from multiple origins.

This commit raises a warning message for a combination of options
('copy_data = true' and 'origin = none') during CREATE/ALTER subscription
operations if the publication tables were also replicated from other
publishers.

During replication, we can skip the data from other origins as we have that
information in WAL but that is not possible during initial sync so we raise
a warning if there is such a possibility.

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-09-05 Thread Peter Smith
Thanks for addressing my previous review comments. I have no more
comments for v46*.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




RE: Handle infinite recursion in logical replication setup

2022-09-05 Thread wangw.f...@fujitsu.com
On Tues, 6 Sept 2022 at 11:14, vignesh C  wrote:
> Thanks for the comments, the attached patch has the changes for the same.

Thanks for updating the patch.

Here is one comment for 0001 patch.
1. The query in function check_publications_origin.
+   appendStringInfoString(&cmd,
+  "SELECT DISTINCT P.pubname 
AS pubname\n"
+  "FROM pg_publication P,\n"
+  " LATERAL 
pg_get_publication_tables(P.pubname) GPT\n"
+  " LEFT JOIN 
pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
+  " pg_class C JOIN 
pg_namespace N ON (N.oid = C.relnamespace)\n"
+  "WHERE C.oid = GPT.relid AND 
PS.srrelid IS NOT NULL AND P.pubname IN (");

Since I found that we only use "PS.srrelid" in the WHERE statement by
specifying "PS.srrelid IS NOT NULL", could we just use "[INNER] JOIN" to join
the table pg_subscription_rel?

Regards,
Wang wei


RE: Handle infinite recursion in logical replication setup

2022-09-05 Thread shiy.f...@fujitsu.com
On Tue, Sep 6, 2022 11:14 AM vignesh C  wrote:
> 
> Thanks for the comments, the attached patch has the changes for the same.
> 

Thanks for updating the patch. Here are some comments.

1.
+   if (subrel_count)
+   {
+   /*
+* In case of ALTER SUBSCRIPTION ... REFRESH, subrel_local_oids
+* contains the list of relation oids that are already present 
on the
+* subscriber. This check should be skipped for these tables.
+*/
+   appendStringInfo(&cmd, "AND C.oid NOT IN (\n"
+"SELECT C.oid \n"
+"FROM pg_class C\n"
+" JOIN pg_namespace N ON 
N.oid = C.relnamespace\n"
+"WHERE ");
+   get_skip_tables_str(subrel_local_oids, subrel_count, &cmd);
+   appendStringInfo(&cmd, ")");
+   }

I think we can skip the tables without using a subquery. See the SQL below.
Would it be better?

SELECT DISTINCT P.pubname AS pubname
FROM pg_publication P,
 LATERAL pg_get_publication_tables(P.pubname) GPT
 LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),
 pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND P.pubname IN ('p1', 'p3')
AND NOT (N.nspname = 'public' AND C.relname = 'tbl')
AND NOT (N.nspname = 'public' AND C.relname = 't1');


2.
+   When using a subscription parameter combination of
+   copy_data=true and origin=NONE, the

Could we use "copy_data = true" and "origin = NONE" (add two spaces around the
equals sign)? I think it looks clearer that way. And it is consistent with other
places in this file.

Regards,
Shi yu


Re: Handle infinite recursion in logical replication setup

2022-09-05 Thread vignesh C
On Mon, 5 Sept 2022 at 15:10, Amit Kapila  wrote:
>
> On Mon, Sep 5, 2022 at 9:47 AM Peter Smith  wrote:
> >
> > Here are my review comments for v45-0001:
> >
> > ==
> >
> > 1. doc/src/sgml/logical-replication.sgml
> >
> >   
> >To find which tables might potentially include non-local origins (due to
> >other subscriptions created on the publisher) try this SQL query:
> > 
> > SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename
> > FROM pg_publication P,
> >  LATERAL pg_get_publication_tables(P.pubname) GPT
> >  LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),
> >  pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
> > WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND
> >   P.pubname IN (pub-names);
> > 
> >
> > 1a.
> > To use "" with the <> then simply put meta characters in the 
> > SGML.
> > e.g.
> > 
> >
> > ~
> >
> > 1b.
> > The patch forgot to add the SQL "#" instruction as suggested by my
> > previous comment (see [1] #3b)
> >
> > ~~~
> >
> > 2.
> >
> >  
> >   Preventing Data Inconsistencies for copy_data, origin=NONE
> >
> > The title is OK, but I think this should all be a  sub-section
> > of "31.2 Subscription"
> >
> > ==
> >
>
> It is recommended to create the subscription
> +   using enabled=false, so that if the origin WARNING 
> occurs
> +   no copy has happened yet. Otherwise some corrective steps might be needed 
> to
> +   remove any unwanted data that got copied.
>
> I am not completely sure of this part of the docs as this can add
> additional steps for users while working on subscriptions even when
> the same is not required. I suggest for now we can remove this part.
> Later based on some feedback on this feature, we can extend the docs
> if required.

Modified

> Also, instead of having it as a separate section, let's keep this as
> part of create_subscription.sgml

Modified

> *
> + errhint("Verify that initial data copied from the publisher tables
> did not come from other origins. Some corrective action may be
> necessary."));
>
> The second sentence in this message doesn't seem to be required.

Modified

Thanks for the comments, the v46 patch at [1] has the changes for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm3TTGdCCkeDsN8hqtF_2z-8%2B%3D3tc9Gh5xOKAQ_BRMCkMA%40mail.gmail.com

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-09-05 Thread vignesh C
On Mon, 5 Sept 2022 at 09:47, Peter Smith  wrote:
>
> Here are my review comments for v45-0001:
>
> ==
>
> 1. doc/src/sgml/logical-replication.sgml
>
>   
>To find which tables might potentially include non-local origins (due to
>other subscriptions created on the publisher) try this SQL query:
> 
> SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename
> FROM pg_publication P,
>  LATERAL pg_get_publication_tables(P.pubname) GPT
>  LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),
>  pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
> WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND
>   P.pubname IN (pub-names);
> 
>
> 1a.
> To use "" with the <> then simply put meta characters in the SGML.
> e.g.
> 

Modified

> ~
>
> 1b.
> The patch forgot to add the SQL "#" instruction as suggested by my
> previous comment (see [1] #3b)

Modified

> ~~~
>
> 2.
>
>  
>   Preventing Data Inconsistencies for copy_data, origin=NONE
>
> The title is OK, but I think this should all be a  sub-section
> of "31.2 Subscription"

I have moved it to create subscription notes based on a recent comment
from Amit.

> ==
>
> 3. src/backend/commands/subscriptioncmds.c - check_publications_origin
>
> + initStringInfo(&cmd);
> + appendStringInfoString(&cmd,
> +"SELECT DISTINCT P.pubname AS pubname\n"
> +"FROM pg_publication P,\n"
> +" LATERAL pg_get_publication_tables(P.pubname) GPT\n"
> +" LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
> +" pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
> +"WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND P.pubname IN (");
> + get_publications_str(publications, &cmd, true);
> + appendStringInfoChar(&cmd, ')');
> + get_skip_tables_str(subrel_local_oids, subrel_count, &cmd);
>
> (see from get_skip_tables_str)
>
> + appendStringInfoString(dest, " AND C.oid NOT IN (SELECT C.oid FROM
> pg_class C JOIN pg_namespace N on N.oid = C.relnamespace where ");
>
>
> IMO the way you are using get_skip_tables_str should be modified. I
> will show by way of example below.
> - "where" -> "WHERE"
> - put the SELECT at the caller instead of inside the function
> - handle the ")" at the caller
>
> All this will also make the body of the 'get_skip_tables_str' function
> much simpler (see the next review comments)
>
> SUGGESTION
> if (subrel_count > 0)
> {
> /* TODO - put some explanatory comment here about skipping the tables */
> appendStringInfo(&cmd,
> " AND C.oid NOT IN (\n"
> "SELECT C.oid FROM pg_class C\n"
> "JOIN pg_namespace N on N.oid = C.relnamespace WHERE ");
> get_skip_tables_str(subrel_local_oids, subrel_count, &cmd);
> appendStringInf(&cmd, “)”);
> }

Modified

> ~~~
>
> 4. src/backend/commands/subscriptioncmds.c - get_skip_tables_str
>
> +/*
> + * Add the table names that should be skipped.
> + */
>
> This comment does not have enough detail to know really what the
> function is for. Perhaps you only need to say that this is a helper
> function for 'check_publications_origin' and then where it is called
> you can give a comment (e.g. see my review comment #3)

Modified

> ~~
>
> 5. get_skip_tables_str (body)
>
> 5a. Variable 'count' is not a very good name; IMO just say 'i' for
> index, and it can be declared C99 style.

Modified

> ~
>
> 5b. Variable 'first' is not necessary - it's same as (i == 0)

Modified

> ~
>
> 5c. The whole "SELECT" part and the ")" parts are more simply done at
> the caller (see the review comment #3)

Modified

> ~
>
> 5d.
>
> + appendStringInfo(dest, "(C.relname = '%s' and N.nspname = '%s')",
> + tablename, schemaname);
>
> It makes no difference but I thought it would feel more natural if the
> SQL would compare the schema name *before* the table name.

Modified

> ~
>
> 5e.
> "and" -> "AND"

Modified

> ~
>
> Doing all 5a,b,c,d, and e means overall having a much simpler function body:
>
> SUGGESTION
> + for (int i = 0; i < subrel_count; i++)
> + {
> + Oid relid = subrel_local_oids[i];
> + char*schemaname = get_namespace_name(get_rel_namespace(relid));
> + char*tablename = get_rel_name(relid);
> +
> + if (i > 0)
> + appendStringInfoString(dest, " OR ");
> +
> + appendStringInfo(dest, "(N.nspname = '%s' AND C.relname = '%s')",
> + schemaname, tablename);
> + }

Modified

Thanks for the comments, the attached patch has the changes for the same.

Regards,
Vignesh


v46-0001-Check-and-log-a-warning-if-the-publisher-has-sub.patch
Description: Binary data


v46-0002-Document-the-steps-for-replication-between-prima.patch
Description: Binary data


Re: Handle infinite recursion in logical replication setup

2022-09-05 Thread Amit Kapila
On Mon, Sep 5, 2022 at 9:47 AM Peter Smith  wrote:
>
> Here are my review comments for v45-0001:
>
> ==
>
> 1. doc/src/sgml/logical-replication.sgml
>
>   
>To find which tables might potentially include non-local origins (due to
>other subscriptions created on the publisher) try this SQL query:
> 
> SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename
> FROM pg_publication P,
>  LATERAL pg_get_publication_tables(P.pubname) GPT
>  LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),
>  pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
> WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND
>   P.pubname IN (pub-names);
> 
>
> 1a.
> To use "" with the <> then simply put meta characters in the SGML.
> e.g.
> 
>
> ~
>
> 1b.
> The patch forgot to add the SQL "#" instruction as suggested by my
> previous comment (see [1] #3b)
>
> ~~~
>
> 2.
>
>  
>   Preventing Data Inconsistencies for copy_data, origin=NONE
>
> The title is OK, but I think this should all be a  sub-section
> of "31.2 Subscription"
>
> ==
>

It is recommended to create the subscription
+   using enabled=false, so that if the origin WARNING occurs
+   no copy has happened yet. Otherwise some corrective steps might be needed to
+   remove any unwanted data that got copied.

I am not completely sure of this part of the docs as this can add
additional steps for users while working on subscriptions even when
the same is not required. I suggest for now we can remove this part.
Later based on some feedback on this feature, we can extend the docs
if required.

Also, instead of having it as a separate section, let's keep this as
part of create_subscription.sgml

*
+ errhint("Verify that initial data copied from the publisher tables
did not come from other origins. Some corrective action may be
necessary."));

The second sentence in this message doesn't seem to be required.


-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-09-04 Thread Peter Smith
Here are my review comments for v45-0001:

==

1. doc/src/sgml/logical-replication.sgml

  
   To find which tables might potentially include non-local origins (due to
   other subscriptions created on the publisher) try this SQL query:

SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename
FROM pg_publication P,
 LATERAL pg_get_publication_tables(P.pubname) GPT
 LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),
 pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND
  P.pubname IN (pub-names);


1a.
To use "" with the <> then simply put meta characters in the SGML.
e.g.


~

1b.
The patch forgot to add the SQL "#" instruction as suggested by my
previous comment (see [1] #3b)

~~~

2.

 
  Preventing Data Inconsistencies for copy_data, origin=NONE

The title is OK, but I think this should all be a  sub-section
of "31.2 Subscription"

==

3. src/backend/commands/subscriptioncmds.c - check_publications_origin

+ initStringInfo(&cmd);
+ appendStringInfoString(&cmd,
+"SELECT DISTINCT P.pubname AS pubname\n"
+"FROM pg_publication P,\n"
+" LATERAL pg_get_publication_tables(P.pubname) GPT\n"
+" LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
+" pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
+"WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND P.pubname IN (");
+ get_publications_str(publications, &cmd, true);
+ appendStringInfoChar(&cmd, ')');
+ get_skip_tables_str(subrel_local_oids, subrel_count, &cmd);

(see from get_skip_tables_str)

+ appendStringInfoString(dest, " AND C.oid NOT IN (SELECT C.oid FROM
pg_class C JOIN pg_namespace N on N.oid = C.relnamespace where ");


IMO the way you are using get_skip_tables_str should be modified. I
will show by way of example below.
- "where" -> "WHERE"
- put the SELECT at the caller instead of inside the function
- handle the ")" at the caller

All this will also make the body of the 'get_skip_tables_str' function
much simpler (see the next review comments)

SUGGESTION
if (subrel_count > 0)
{
/* TODO - put some explanatory comment here about skipping the tables */
appendStringInfo(&cmd,
" AND C.oid NOT IN (\n"
"SELECT C.oid FROM pg_class C\n"
"JOIN pg_namespace N on N.oid = C.relnamespace WHERE ");
get_skip_tables_str(subrel_local_oids, subrel_count, &cmd);
appendStringInf(&cmd, “)”);
}

~~~

4. src/backend/commands/subscriptioncmds.c - get_skip_tables_str

+/*
+ * Add the table names that should be skipped.
+ */

This comment does not have enough detail to know really what the
function is for. Perhaps you only need to say that this is a helper
function for 'check_publications_origin' and then where it is called
you can give a comment (e.g. see my review comment #3)

~~

5. get_skip_tables_str (body)

5a. Variable 'count' is not a very good name; IMO just say 'i' for
index, and it can be declared C99 style.

~

5b. Variable 'first' is not necessary - it's same as (i == 0)

~

5c. The whole "SELECT" part and the ")" parts are more simply done at
the caller (see the review comment #3)

~

5d.

+ appendStringInfo(dest, "(C.relname = '%s' and N.nspname = '%s')",
+ tablename, schemaname);

It makes no difference but I thought it would feel more natural if the
SQL would compare the schema name *before* the table name.

~

5e.
"and" -> "AND"

~

Doing all 5a,b,c,d, and e means overall having a much simpler function body:

SUGGESTION
+ for (int i = 0; i < subrel_count; i++)
+ {
+ Oid relid = subrel_local_oids[i];
+ char*schemaname = get_namespace_name(get_rel_namespace(relid));
+ char*tablename = get_rel_name(relid);
+
+ if (i > 0)
+ appendStringInfoString(dest, " OR ");
+
+ appendStringInfo(dest, "(N.nspname = '%s' AND C.relname = '%s')",
+ schemaname, tablename);
+ }

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPsku25%2BVjz7HiohWxc2WU07O_ZV4voFG%2BU7WzwKhUzpGQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Handle infinite recursion in logical replication setup

2022-09-02 Thread vignesh C
On Fri, 2 Sept 2022 at 13:51, Peter Smith  wrote:
>
> Here are my review comments for the latest patch v44-0001.
>
> ==
>
> 1. doc/src/sgml/logical-replication.sgml
>
> + 
> +  Specifying origins for subscription
>
> I thought the title is OK, but maybe can be better. OTOH, I am not
> sure if my suggestions below are improvements or not. Anyway, even if
> the title says the same, the convention is to use uppercase words.
>
> Something like:
> "Specifying Origins for Subscriptions"
> "Specifying Data Origins for Subscriptions"
> "Specifying Data Origins in CREATE SUBSCRIPTION"
> "Subscription Data Origins"

Modified to "Preventing Data Inconsistencies for copy_data,
origin=NONE" to avoid confusions

> ~~~
>
> 2.
>
> Currently, this new section in this patch is only discussing the
> *problems* users might encounter for using copy_data,origin=NONE, but
> I think a section with a generic title about "subscription origins"
> should be able to stand alone. For example, it should give some brief
> mention of what is the meaning/purpose of origin=ANY and origin=NONE.
> And it should xref back to CREATE SUBSCRIPTION page.
>
> IMO all this content currently in the patch maybe belongs in a
> sub-section of this new section (e.g. call it something like
> "Preventing Data Inconsistencies for copy_data,origin=NONE")

I wanted to just document the inconsistency issue when copy_data and
origin is used. I felt the origin information was already present in
create subscription page. I have not documented the origin again here.
I have renamed the section to "Preventing Data Inconsistencies for
copy_data, origin=NONE" to avoid any confusions.

> ~~~
>
> 3.
>
> +   To find which tables might potentially include non-local origins (due to
> +   other subscriptions created on the publisher) try this SQL query by
> +   specifying the publications in IN condition:
> +
> +SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename
> +FROM pg_publication P,
> + LATERAL pg_get_publication_tables(P.pubname) GPT
> + LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),
> + pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
> +WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND
> +  P.pubname IN (...);
> +
> +  
>
> 3a.
> I think the "To find..." sentence should be a new paragraph.

modified

> ~
>
> 3b.
> Instead of saying "specifying the publications in IN condition" I
> think it might be simpler if those instructions are part of the SQL
>
> SUGGESTION
> +
> +# substitute  below with your publication name(s) to be queried
> +SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename
> +FROM pg_publication P,
> + LATERAL pg_get_publication_tables(P.pubname) GPT
> + LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),
> + pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
> +WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND
> +  P.pubname IN ();
> +

 modified, I could not use  since it was considering it as a
tag, instead I have changed it to pub-names

> ~
>
> 3c.
> 
>   
>
> I recall there was a commit or hackers post sometime earlier this year
> that reported it is better for these particular closing tags to be
> done together like  because otherwise there is
> some case where the whitespace might not render correctly.
> Unfortunately, I can't seem to find the evidence of that post/commit,
> but I am sure I saw something about this because I have been using
> that practice ever since I saw it.

Modified

> ==
>
> 4. doc/src/sgml/ref/alter_subscription.sgml
>
> + 
> +  Refer to the  linkend="specifying-origins-for-subscription"/> about how
> +  copy_data = true can interact with the
> +  origin parameter.
> + 
>
> Previously when this xref pointed to the "Note" section the sentence
> looked OK (it said, "Refer to the Note about how...") but now the word
> is not "Note" anymore, (e.g. "Refer to the Section 31.3 about how
> copy_data = true can interact with the origin parameter. "), so I
> think this should be tweaked...
>
> "Refer to the XXX about how..." -> "See XXX for details of how..."

Modified

> ==
>
> 5. doc/src/sgml/ref/create_subscription.sgml
>
> Save as comment #4 (2 times)

Modified

> ==
>
> 6. src/backend/commands/subscriptioncmds.c
>
> + if (bsearch(&relid, subrel_local_oids,
> + subrel_count, sizeof(Oid), oid_cmp))
> + isnewtable = false;
>
> SUGGESTION (search PG source - there are examples like this)
> newtable = bsearch(&relid, subrel_local_oids, subrel_count,
> sizeof(Oid), oid_cmp);

This code has been removed as part of another comment, the comment no
more applies.

> ~~~
>
> 7.
>
> + if (list_length(publist))
> + {
>
> I think the proper way to test for non-empty List is
>
> if (publist != NIL) { ... }
>
> or simply
>
> if (publist) { ... }

Modified

> ~~~
>
> 8.
>
> + errmsg("subscription \"%s\" requested origin=NONE but might copy
> data that had a different origin",
>
> Do

Re: Handle infinite recursion in logical replication setup

2022-09-02 Thread vignesh C
On Mon, 29 Aug 2022 at 16:35, Amit Kapila  wrote:
>
> On Wed, Aug 24, 2022 at 7:27 PM vignesh C  wrote:
> >
> > Since there was no objections to change it to throw a warning, I have
> > made the changes for the same.
> >
>
> Review comments for v42-0001*
> ==
> 1. Can we improve the query in check_pub_table_subscribed() so that it
> doesn't fetch any table that is already part of the subscription on
> the subscriber? This may be better than what the patch is doing which
> is to first fetch such information and then skip it. If forming this
> query turns out to be too complex then we can retain your method as
> well but I feel it is worth trying to optimize the query used in the
> patch.

I have modified the query to skip the tables as part of the query. The
attached v45 patch has the changes for the same.

Regards,
Vignesh


v45-0001-Check-and-log-a-warning-if-the-publisher-has-sub.patch
Description: Binary data


v45-0002-Document-the-steps-for-replication-between-prima.patch
Description: Binary data


Re: Handle infinite recursion in logical replication setup

2022-09-02 Thread Peter Smith
On Fri, Sep 2, 2022 at 6:21 PM Peter Smith  wrote:
>
> Here are my review comments for the latest patch v44-0001.
>
...
>
> 6. src/backend/commands/subscriptioncmds.c
>
> + if (bsearch(&relid, subrel_local_oids,
> + subrel_count, sizeof(Oid), oid_cmp))
> + isnewtable = false;
>
> SUGGESTION (search PG source - there are examples like this)
> newtable = bsearch(&relid, subrel_local_oids, subrel_count,
> sizeof(Oid), oid_cmp);
>

Oops. Of course, I meant !bsearch

SUGGESTION
newtable = !bsearch(&relid, subrel_local_oids, subrel_count,
sizeof(Oid), oid_cmp);

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Handle infinite recursion in logical replication setup

2022-09-02 Thread Peter Smith
Here are my review comments for the latest patch v44-0001.

==

1. doc/src/sgml/logical-replication.sgml

+ 
+  Specifying origins for subscription

I thought the title is OK, but maybe can be better. OTOH, I am not
sure if my suggestions below are improvements or not. Anyway, even if
the title says the same, the convention is to use uppercase words.

Something like:
"Specifying Origins for Subscriptions"
"Specifying Data Origins for Subscriptions"
"Specifying Data Origins in CREATE SUBSCRIPTION"
"Subscription Data Origins"

~~~

2.

Currently, this new section in this patch is only discussing the
*problems* users might encounter for using copy_data,origin=NONE, but
I think a section with a generic title about "subscription origins"
should be able to stand alone. For example, it should give some brief
mention of what is the meaning/purpose of origin=ANY and origin=NONE.
And it should xref back to CREATE SUBSCRIPTION page.

IMO all this content currently in the patch maybe belongs in a
sub-section of this new section (e.g. call it something like
"Preventing Data Inconsistencies for copy_data,origin=NONE")

~~~

3.

+   To find which tables might potentially include non-local origins (due to
+   other subscriptions created on the publisher) try this SQL query by
+   specifying the publications in IN condition:
+
+SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename
+FROM pg_publication P,
+ LATERAL pg_get_publication_tables(P.pubname) GPT
+ LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),
+ pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
+WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND
+  P.pubname IN (...);
+
+  

3a.
I think the "To find..." sentence should be a new paragraph.

~

3b.
Instead of saying "specifying the publications in IN condition" I
think it might be simpler if those instructions are part of the SQL

SUGGESTION
+
+# substitute  below with your publication name(s) to be queried
+SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename
+FROM pg_publication P,
+ LATERAL pg_get_publication_tables(P.pubname) GPT
+ LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),
+ pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)
+WHERE C.oid = GPT.relid AND PS.srrelid IS NOT NULL AND
+  P.pubname IN ();
+

~

3c.

  

I recall there was a commit or hackers post sometime earlier this year
that reported it is better for these particular closing tags to be
done together like  because otherwise there is
some case where the whitespace might not render correctly.
Unfortunately, I can't seem to find the evidence of that post/commit,
but I am sure I saw something about this because I have been using
that practice ever since I saw it.

==

4. doc/src/sgml/ref/alter_subscription.sgml

+ 
+  Refer to the  about how
+  copy_data = true can interact with the
+  origin parameter.
+ 

Previously when this xref pointed to the "Note" section the sentence
looked OK (it said, "Refer to the Note about how...") but now the word
is not "Note" anymore, (e.g. "Refer to the Section 31.3 about how
copy_data = true can interact with the origin parameter. "), so I
think this should be tweaked...

"Refer to the XXX about how..." -> "See XXX for details of how..."

==

5. doc/src/sgml/ref/create_subscription.sgml

Save as comment #4 (2 times)

==

6. src/backend/commands/subscriptioncmds.c

+ if (bsearch(&relid, subrel_local_oids,
+ subrel_count, sizeof(Oid), oid_cmp))
+ isnewtable = false;

SUGGESTION (search PG source - there are examples like this)
newtable = bsearch(&relid, subrel_local_oids, subrel_count,
sizeof(Oid), oid_cmp);

~~~

7.

+ if (list_length(publist))
+ {

I think the proper way to test for non-empty List is

if (publist != NIL) { ... }

or simply

if (publist) { ... }

~~~

8.

+ errmsg("subscription \"%s\" requested origin=NONE but might copy
data that had a different origin",

Do you think this should say "requested copy_data with origin=NONE",
instead of just "requested origin=NONE"?


--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Handle infinite recursion in logical replication setup

2022-09-01 Thread vignesh C
On Thu, 1 Sept 2022 at 09:19, Amit Kapila  wrote:
>
> On Wed, Aug 31, 2022 at 11:35 AM Peter Smith  wrote:
> >
> > Here are my review comments for v43-0001.
> >
> > ==
> >
> > 1. Commit message
> >
> > The initial copy phase has no way to know the origin of the row data,
> > so if 'copy_data = true' in the step 4 below, log a warning to notify user
> > that potentially non-local data might have been copied.
> >
> > 1a.
> > "in the step 4" -> "in step 4"
> >
> > ~
> >
> > 1b.
> > "notify user" -> "notify the user"
> >
> > ==
> >
> > 2. doc/src/sgml/ref/create_subscription.sgml
> >
> > +  
> > +   If the subscription is created with origin = none and
> > +   copy_data = true, it will check if the publisher has
> > +   subscribed to the same table from other publishers and, if so, log a
> > +   warning to notify the user to check the publisher tables. Before 
> > continuing
> > +   with other operations the user should check that publisher tables did 
> > not
> > +   have data with different origins to prevent data inconsistency issues 
> > on the
> > +   subscriber.
> > +  
> >
> > I am not sure about that wording saying "to prevent data inconsistency
> > issues" because I thought maybe is already too late because any
> > unwanted origin data is already copied during the initial sync. I
> > think the user can do it try to clean up any problems caused before
> > things become even worse (??)
> >
> > ~~~
> >
> > You asked for my thoughts (see [1] 6b):
> >
> > > There is nothing much a user can do in this case. Only option would be
> > > to take a backup before the operation and restore it and then recreate
> > > the replication setup. I was not sure if we should document these
> > > steps as similar inconsistent data could get created without the
> > > origin option . Thoughts?
> >
> > I think those cases are a bit different:
> >
> > - For a normal scenario (i.e. origin=ANY) then inconsistent data just
> > refers to problems like maybe PK violations so I think you just get
> > what you get and have to deal with the errors...
> >
> > - But when the user said origin=NONE they are specifically saying they
> > do NOT want any non-local data, yet they still might end up getting
> > data contrary to their wishes. So I think maybe there ought to be
> > something documented to warn about this potentially happening. My
> > first impression is that all this seems like a kind of terrible
> > problem because if it happens then cleanup could be difficult - if
> > that subscriber in turn was also a publisher then this bad data might
> > have propagated all over the place already! Anyway, I guess all this
> > could be mentioned in some (new ?) section of the
> > logical-replication.sgml file (i.e. patch 0002).
> >
> > IDEA: I was wondering if the documentation should recommend (or maybe
> > we can even enforce this) that when using origin=NONE the user should
> > always create the subscription with enabled=FALSE. That way if there
> > are some warnings about potential bad origin data then there it might
> > be possible to take some action or change their mind *before* any
> > unwanted data pollutes everything.
> >
> > ==
> >
> > 3. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed
> >
> > + if (count == 0)
> > + tablenames = makeStringInfo();
> > + else
> > + appendStringInfoString(tablenames, ", ");
> > +
> > + appendStringInfo(tablenames, "%s.%s", nspname, relname);
> > + count++;
> >
> > I think the quotes are not correct - each separate table name should be 
> > quoted.
> >
> > ~~~
> >
> > 4.
> >
> > + /*
> > + * There might be many tables present in this case, we will display a
> > + * maximum of 100 tables followed by "..." to indicate there might be
> > + * more tables.
> > + */
> > + if (count == 100)
> > + {
> > + appendStringInfoString(tablenames, ", ...");
> > + break;
> > + }
> >
> > 4a.
> > IMO this code should be part of the previous if/else
> >
> > e.g.
> > if (count == 0) ... makeStringInfo()
> > else if (count > 100) ... append ellipsis "..." and break out of the loop
> > else ... append table name to the message
> >
> > Doing it in this way means "..." definitely means there are more than
> > 100 bad tables.
> >
> > ~
> >
> > 4b.
> > Changing like above (#4a) simplifies the comment because it removes
> > doubts like "might be…" since you already know you found the 101st
> > table.
> >
> > SUGGESTION (comment)
> > The message displays a maximum of 100 tables having potentially
> > non-local data. Any more than that will be indicated by "...".
> >
> > ~
> >
>
> Are we using this style of an error message anywhere else in the code?
> If not, then I think we should avoid it here as well as it appears a
> bit adhoc to me in the sense that why restrict at 100 tables. Can we
> instead think of displaying the publications list in the message? So
> errdetail would be something like: Subscribed publications \"%s\" has
> been subscribed from some other publisher. Then we can probably giv

Re: Handle infinite recursion in logical replication setup

2022-09-01 Thread vignesh C
On Thu, 1 Sept 2022 at 08:01, shiy.f...@fujitsu.com
 wrote:
>
> On Wed, Aug 31, 2022 1:06 AM vignesh C  wrote:
> >
> > The attached v43 patch has the changes for the same.
> >
>
> Thanks for updating the patch.
>
> Here is a comment on the 0001 patch.
>
> +   if (!isnewtable)
> +   {
> +   pfree(nspname);
> +   pfree(relname);
> +   continue;
> +   }
>
> If it is a new table, in which case it would log a warning, should we also 
> call
> pfree()?

Yes it should be added, I have fixed the same in the v44 patch attached at [1].
[1] -  
https://www.postgresql.org/message-id/CALDaNm0NRJ1O1cYcZD%3Df7NgynozFprb7zpJSayFN5rcaS44G6Q%40mail.gmail.com

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-09-01 Thread vignesh C
On Wed, 31 Aug 2022 at 11:45, Peter Smith  wrote:
>
> Here are some review comments for patch v43-0002:
>
> ==
>
> 1. doc/src/sgml/ref/create_subscription.sgml
>
> @@ -403,7 +403,9 @@ CREATE SUBSCRIPTION  class="parameter">subscription_name warning to notify the user to check the publisher tables. Before 
> continuing
> with other operations the user should check that publisher tables did not
> have data with different origins to prevent data inconsistency issues on 
> the
> -   subscriber.
> +   subscriber. Refer to  for
> +   how copy_data and origin can be used
> +   to set up replication between primaries.
>
>
> Regarding my earlier v43-0001 review (see [1] comment #2) perhaps
> another pg docs section should be added in the
> logical-replication.sgml (e.g. "Specifying origins during CREATE
> SUBSCRIPTION"), so then this Notes text also should have more added to
> it.
>
> SUGGESTION
> Refer to  for details about potential initialization
> inconsistency warnings using origin=NONE.
> Refer to  for how copy_data and origin can be used to set up
> replication between primaries.

I have moved all these contents to a separate section in the
logical-replication page. I have referred to this link from the
documentation of origin and copy_data parameter. I have also referred
to "setting up replication between primaries" in the newly added
section. Since this new section is referred to from other places, I
felt we need not provide a link from create_subscription notes. The
changes for the same are available at [1].
[1] - 
https://www.postgresql.org/message-id/CALDaNm0NRJ1O1cYcZD%3Df7NgynozFprb7zpJSayFN5rcaS44G6Q%40mail.gmail.com

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-09-01 Thread vignesh C
On Mon, 29 Aug 2022 at 16:35, Amit Kapila  wrote:
>
> On Wed, Aug 24, 2022 at 7:27 PM vignesh C  wrote:
> >
> > Since there was no objections to change it to throw a warning, I have
> > made the changes for the same.
> >
>
> Review comments for v42-0001*
> ==
> 1. Can we improve the query in check_pub_table_subscribed() so that it
> doesn't fetch any table that is already part of the subscription on
> the subscriber? This may be better than what the patch is doing which
> is to first fetch such information and then skip it. If forming this
> query turns out to be too complex then we can retain your method as
> well but I feel it is worth trying to optimize the query used in the
> patch.

I'm analysing this, I will post a reply for this soon.

> 2. I thought that it may be better to fetch all the tables that have
> the possibility to have data from more than one origin but maybe the
> list will be too long especially for FOR ALL TABLE types of cases. Is
> this the reason you have decided to give a WARNING as soon as you see
> any such table? If so, probably adding a comment for it can be good.

Yes that was the reason, now I have changed it to display the
publications based on your recent comment whose count will be
significantly very much lesser compared to the number of tables.

> 3.
> + $node_publisher->wait_for_catchup($sub_name);
> +
> + # also wait for initial table sync to finish
> + $node_subscriber->poll_query_until('postgres', $synced_query)
> +   or die "Timed out while waiting for subscriber to synchronize data";
> 
> 
> 
> +$node_B->safe_psql(
> + 'postgres', "
> +ALTER SUBSCRIPTION $subname_BA REFRESH PUBLICATION");
> +
> +$node_B->poll_query_until('postgres', $synced_query)
> +  or die "Timed out while waiting for subscriber to synchronize data";
>
> You can replace this and any similar code in the patch with a method
> used in commit 0c20dd33db.

Modified

> 4.
> +###
> +# Join 3rd node (node_C) to the existing 2 nodes(node_A & node_B) 
> bidirectional
> +# replication setup when the existing nodes (node_A & node_B) has 
> pre-existing
> +# data and the new node (node_C) does not have any data.
> +###
> +$result = $node_A->safe_psql('postgres', "SELECT * FROM tab ORDER BY 1;");
> +is($result, qq(), 'Check existing data');
> +
> +$result = $node_B->safe_psql('postgres', "SELECT * FROM tab ORDER BY 1;");
> +is($result, qq(), 'Check existing data');
> +
> +$result = $node_C->safe_psql('postgres', "SELECT * FROM tab ORDER BY 1;");
> +is($result, qq(), 'Check existing data');
>
> The comments say that for this test, two of the nodes have some
> pre-existing data but I don't see the same in the test results. The
> test following this one will also have a similar effect. BTW, I am not
> sure all the new three node tests added by this patch are required
> because I don't see if they have additional code coverage or test
> anything which is not tested without those. This has doubled the
> amount of test timing for this test file which is okay if these tests
> make any useful contribution but otherwise, it may be better to remove
> these. Am, I missing something?

Earlier we felt with the origin patch we could support joining of
nodes to existing n-wary replication setup in various scenarios, I had
added the tests for the same scenarios. But as the patch evolved, I
could understand that this patch cannot handle the add node scenarios.
I have removed these tests.

Thanks for the comments, the v44 patch attached at [1] has the changes
for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm0NRJ1O1cYcZD%3Df7NgynozFprb7zpJSayFN5rcaS44G6Q%40mail.gmail.com

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-09-01 Thread vignesh C
On Wed, 31 Aug 2022 at 11:35, Peter Smith  wrote:
>
> Here are my review comments for v43-0001.
>
> ==
>
> 1. Commit message
>
> The initial copy phase has no way to know the origin of the row data,
> so if 'copy_data = true' in the step 4 below, log a warning to notify user
> that potentially non-local data might have been copied.
>
> 1a.
> "in the step 4" -> "in step 4"

Modified

> ~
>
> 1b.
> "notify user" -> "notify the user"

Modified

> ==
>
> 2. doc/src/sgml/ref/create_subscription.sgml
>
> +  
> +   If the subscription is created with origin = none and
> +   copy_data = true, it will check if the publisher has
> +   subscribed to the same table from other publishers and, if so, log a
> +   warning to notify the user to check the publisher tables. Before 
> continuing
> +   with other operations the user should check that publisher tables did not
> +   have data with different origins to prevent data inconsistency issues on 
> the
> +   subscriber.
> +  
>
> I am not sure about that wording saying "to prevent data inconsistency
> issues" because I thought maybe is already too late because any
> unwanted origin data is already copied during the initial sync. I
> think the user can do it try to clean up any problems caused before
> things become even worse (??)
>
> ~~~
>
> You asked for my thoughts (see [1] 6b):
>
> > There is nothing much a user can do in this case. Only option would be
> > to take a backup before the operation and restore it and then recreate
> > the replication setup. I was not sure if we should document these
> > steps as similar inconsistent data could get created without the
> > origin option . Thoughts?
>
> I think those cases are a bit different:
>
> - For a normal scenario (i.e. origin=ANY) then inconsistent data just
> refers to problems like maybe PK violations so I think you just get
> what you get and have to deal with the errors...
>
> - But when the user said origin=NONE they are specifically saying they
> do NOT want any non-local data, yet they still might end up getting
> data contrary to their wishes. So I think maybe there ought to be
> something documented to warn about this potentially happening. My
> first impression is that all this seems like a kind of terrible
> problem because if it happens then cleanup could be difficult - if
> that subscriber in turn was also a publisher then this bad data might
> have propagated all over the place already! Anyway, I guess all this
> could be mentioned in some (new ?) section of the
> logical-replication.sgml file (i.e. patch 0002).
>
> IDEA: I was wondering if the documentation should recommend (or maybe
> we can even enforce this) that when using origin=NONE the user should
> always create the subscription with enabled=FALSE. That way if there
> are some warnings about potential bad origin data then there it might
> be possible to take some action or change their mind *before* any
> unwanted data pollutes everything.

Updated document based on your suggestion.

> ==
>
> 3. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed
>
> + if (count == 0)
> + tablenames = makeStringInfo();
> + else
> + appendStringInfoString(tablenames, ", ");
> +
> + appendStringInfo(tablenames, "%s.%s", nspname, relname);
> + count++;
>
> I think the quotes are not correct - each separate table name should be 
> quoted.

We will not be displaying tables anymore, the comment does not apply anymore

> ~~~
>
> 4.
>
> + /*
> + * There might be many tables present in this case, we will display a
> + * maximum of 100 tables followed by "..." to indicate there might be
> + * more tables.
> + */
> + if (count == 100)
> + {
> + appendStringInfoString(tablenames, ", ...");
> + break;
> + }
>
> 4a.
> IMO this code should be part of the previous if/else
>
> e.g.
> if (count == 0) ... makeStringInfo()
> else if (count > 100) ... append ellipsis "..." and break out of the loop
> else ... append table name to the message
>
> Doing it in this way means "..." definitely means there are more than
> 100 bad tables.

This code is removed, the comment is no more applicable.

> ~
>
> 4b.
> Changing like above (#4a) simplifies the comment because it removes
> doubts like "might be…" since you already know you found the 101st
> table.
>
> SUGGESTION (comment)
> The message displays a maximum of 100 tables having potentially
> non-local data. Any more than that will be indicated by "...".

This code is removed, the comment is no more applicable.

> ~
>
> 4c.
> It still seems a bit disconcerting to me that there can be cases where
> bad data was possibly copied but there is no record of what tables
> have been polluted. Perhaps when the maximum bad tables are exceeded
> then there can be some additional message to tell users what SQL they
> can use to discover what all the other bad tables were.

We will not be displaying the tablename anymore, we will be displaying
the publication names which has been subscribed from other
publications.

> 

Re: Handle infinite recursion in logical replication setup

2022-08-31 Thread Amit Kapila
On Wed, Aug 31, 2022 at 11:35 AM Peter Smith  wrote:
>
> Here are my review comments for v43-0001.
>
> ==
>
> 1. Commit message
>
> The initial copy phase has no way to know the origin of the row data,
> so if 'copy_data = true' in the step 4 below, log a warning to notify user
> that potentially non-local data might have been copied.
>
> 1a.
> "in the step 4" -> "in step 4"
>
> ~
>
> 1b.
> "notify user" -> "notify the user"
>
> ==
>
> 2. doc/src/sgml/ref/create_subscription.sgml
>
> +  
> +   If the subscription is created with origin = none and
> +   copy_data = true, it will check if the publisher has
> +   subscribed to the same table from other publishers and, if so, log a
> +   warning to notify the user to check the publisher tables. Before 
> continuing
> +   with other operations the user should check that publisher tables did not
> +   have data with different origins to prevent data inconsistency issues on 
> the
> +   subscriber.
> +  
>
> I am not sure about that wording saying "to prevent data inconsistency
> issues" because I thought maybe is already too late because any
> unwanted origin data is already copied during the initial sync. I
> think the user can do it try to clean up any problems caused before
> things become even worse (??)
>
> ~~~
>
> You asked for my thoughts (see [1] 6b):
>
> > There is nothing much a user can do in this case. Only option would be
> > to take a backup before the operation and restore it and then recreate
> > the replication setup. I was not sure if we should document these
> > steps as similar inconsistent data could get created without the
> > origin option . Thoughts?
>
> I think those cases are a bit different:
>
> - For a normal scenario (i.e. origin=ANY) then inconsistent data just
> refers to problems like maybe PK violations so I think you just get
> what you get and have to deal with the errors...
>
> - But when the user said origin=NONE they are specifically saying they
> do NOT want any non-local data, yet they still might end up getting
> data contrary to their wishes. So I think maybe there ought to be
> something documented to warn about this potentially happening. My
> first impression is that all this seems like a kind of terrible
> problem because if it happens then cleanup could be difficult - if
> that subscriber in turn was also a publisher then this bad data might
> have propagated all over the place already! Anyway, I guess all this
> could be mentioned in some (new ?) section of the
> logical-replication.sgml file (i.e. patch 0002).
>
> IDEA: I was wondering if the documentation should recommend (or maybe
> we can even enforce this) that when using origin=NONE the user should
> always create the subscription with enabled=FALSE. That way if there
> are some warnings about potential bad origin data then there it might
> be possible to take some action or change their mind *before* any
> unwanted data pollutes everything.
>
> ==
>
> 3. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed
>
> + if (count == 0)
> + tablenames = makeStringInfo();
> + else
> + appendStringInfoString(tablenames, ", ");
> +
> + appendStringInfo(tablenames, "%s.%s", nspname, relname);
> + count++;
>
> I think the quotes are not correct - each separate table name should be 
> quoted.
>
> ~~~
>
> 4.
>
> + /*
> + * There might be many tables present in this case, we will display a
> + * maximum of 100 tables followed by "..." to indicate there might be
> + * more tables.
> + */
> + if (count == 100)
> + {
> + appendStringInfoString(tablenames, ", ...");
> + break;
> + }
>
> 4a.
> IMO this code should be part of the previous if/else
>
> e.g.
> if (count == 0) ... makeStringInfo()
> else if (count > 100) ... append ellipsis "..." and break out of the loop
> else ... append table name to the message
>
> Doing it in this way means "..." definitely means there are more than
> 100 bad tables.
>
> ~
>
> 4b.
> Changing like above (#4a) simplifies the comment because it removes
> doubts like "might be…" since you already know you found the 101st
> table.
>
> SUGGESTION (comment)
> The message displays a maximum of 100 tables having potentially
> non-local data. Any more than that will be indicated by "...".
>
> ~
>

Are we using this style of an error message anywhere else in the code?
If not, then I think we should avoid it here as well as it appears a
bit adhoc to me in the sense that why restrict at 100 tables. Can we
instead think of displaying the publications list in the message? So
errdetail would be something like: Subscribed publications \"%s\" has
been subscribed from some other publisher. Then we can probably give a
SQL query for a user to find tables that may data from multiple
origins especially if that is not complicated. Also, then we can
change the function name to check_publications_origin().

Few other minor comments on v43-0001*
1.
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("subscription %s 

RE: Handle infinite recursion in logical replication setup

2022-08-31 Thread shiy.f...@fujitsu.com
On Wed, Aug 31, 2022 1:06 AM vignesh C  wrote:
> 
> The attached v43 patch has the changes for the same.
> 

Thanks for updating the patch.

Here is a comment on the 0001 patch.

+   if (!isnewtable)
+   {
+   pfree(nspname);
+   pfree(relname);
+   continue;
+   }

If it is a new table, in which case it would log a warning, should we also call
pfree()?

Regards,
Shi yu


Re: Handle infinite recursion in logical replication setup

2022-08-31 Thread Peter Smith
On Wed, Aug 31, 2022 at 8:36 PM Amit Kapila  wrote:
>
> On Wed, Aug 31, 2022 at 11:35 AM Peter Smith  wrote:
> >
> > Here are my review comments for v43-0001.
> >
...
> > ==
> >
> > 2. doc/src/sgml/ref/create_subscription.sgml
> >
> > +  
> > +   If the subscription is created with origin = none and
> > +   copy_data = true, it will check if the publisher has
> > +   subscribed to the same table from other publishers and, if so, log a
> > +   warning to notify the user to check the publisher tables. Before 
> > continuing
> > +   with other operations the user should check that publisher tables did 
> > not
> > +   have data with different origins to prevent data inconsistency issues 
> > on the
> > +   subscriber.
> > +  
> >
> > I am not sure about that wording saying "to prevent data inconsistency
> > issues" because I thought maybe is already too late because any
> > unwanted origin data is already copied during the initial sync. I
> > think the user can do it try to clean up any problems caused before
> > things become even worse (??)
> >
>
> Oh no, it is not too late in all cases. The problem can only occur if
> the user will try to subscribe from all the different publications
> with copy_data = true. We are anyway trying to clarify in the second
> patch the right way to accomplish this. So, I am not sure what better
> we can do here. The only bulletproof way is to provide some APIs where
> users don't need to bother about all these cases, basically something
> similar to what Kuroda-San is working on in the thread [1].
>

The point of my review comment was only about the wording of the note
- specifically, you cannot "prevent" something (e,g, data
inconsistency) if it has already happened.

Maybe modify the wording like below?

BEFORE
Before continuing with other operations the user should check that
publisher tables did not have data with different origins to prevent
data inconsistency issues on the subscriber.

AFTER
If a publisher table with data from different origins was found to
have been copied to the subscriber, then some corrective action may be
necessary before continuing with other operations.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Handle infinite recursion in logical replication setup

2022-08-31 Thread Amit Kapila
On Wed, Aug 31, 2022 at 11:35 AM Peter Smith  wrote:
>
> Here are my review comments for v43-0001.
>
> ==
>
> 1. Commit message
>
> The initial copy phase has no way to know the origin of the row data,
> so if 'copy_data = true' in the step 4 below, log a warning to notify user
> that potentially non-local data might have been copied.
>
> 1a.
> "in the step 4" -> "in step 4"
>
> ~
>
> 1b.
> "notify user" -> "notify the user"
>
> ==
>
> 2. doc/src/sgml/ref/create_subscription.sgml
>
> +  
> +   If the subscription is created with origin = none and
> +   copy_data = true, it will check if the publisher has
> +   subscribed to the same table from other publishers and, if so, log a
> +   warning to notify the user to check the publisher tables. Before 
> continuing
> +   with other operations the user should check that publisher tables did not
> +   have data with different origins to prevent data inconsistency issues on 
> the
> +   subscriber.
> +  
>
> I am not sure about that wording saying "to prevent data inconsistency
> issues" because I thought maybe is already too late because any
> unwanted origin data is already copied during the initial sync. I
> think the user can do it try to clean up any problems caused before
> things become even worse (??)
>

Oh no, it is not too late in all cases. The problem can only occur if
the user will try to subscribe from all the different publications
with copy_data = true. We are anyway trying to clarify in the second
patch the right way to accomplish this. So, I am not sure what better
we can do here. The only bulletproof way is to provide some APIs where
users don't need to bother about all these cases, basically something
similar to what Kuroda-San is working on in the thread [1].

> ~~~
>
> You asked for my thoughts (see [1] 6b):
>
> > There is nothing much a user can do in this case. Only option would be
> > to take a backup before the operation and restore it and then recreate
> > the replication setup. I was not sure if we should document these
> > steps as similar inconsistent data could get created without the
> > origin option . Thoughts?
>
> I think those cases are a bit different:
>
> - For a normal scenario (i.e. origin=ANY) then inconsistent data just
> refers to problems like maybe PK violations so I think you just get
> what you get and have to deal with the errors...
>
> - But when the user said origin=NONE they are specifically saying they
> do NOT want any non-local data, yet they still might end up getting
> data contrary to their wishes. So I think maybe there ought to be
> something documented to warn about this potentially happening. My
> first impression is that all this seems like a kind of terrible
> problem because if it happens then cleanup could be difficult - if
> that subscriber in turn was also a publisher then this bad data might
> have propagated all over the place already! Anyway, I guess all this
> could be mentioned in some (new ?) section of the
> logical-replication.sgml file (i.e. patch 0002).
>
> IDEA: I was wondering if the documentation should recommend (or maybe
> we can even enforce this) that when using origin=NONE the user should
> always create the subscription with enabled=FALSE. That way if there
> are some warnings about potential bad origin data then there it might
> be possible to take some action or change their mind *before* any
> unwanted data pollutes everything.
>

If we want to give this suggestion, then we need to also say that this
is only valid when copy_data is true. The other drawback is if users
always need to do this to use origin=NONE then this can be a burden to
the user. I am not against having such a NOTE but the tone should not
be to enforce users to do this.

[1] - 
https://www.postgresql.org/message-id/CAHut%2BPuwRAoWY9pz%3DEubps3ooQCOBFiYPU9Yi%3DVB-U%2ByORU7OA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-08-30 Thread Peter Smith
Here are some review comments for patch v43-0002:

==

1. doc/src/sgml/ref/create_subscription.sgml

@@ -403,7 +403,9 @@ CREATE SUBSCRIPTION subscription_name for
+   how copy_data and origin can be used
+   to set up replication between primaries.
   

Regarding my earlier v43-0001 review (see [1] comment #2) perhaps
another pg docs section should be added in the
logical-replication.sgml (e.g. "Specifying origins during CREATE
SUBSCRIPTION"), so then this Notes text also should have more added to
it.

SUGGESTION
Refer to  for details about potential initialization
inconsistency warnings using origin=NONE.
Refer to  for how copy_data and origin can be used to set up
replication between primaries.

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPvonTd423-cWqoxh0w8Bd_Po3OToqqyxuR1iMNmxSLr_Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Handle infinite recursion in logical replication setup

2022-08-30 Thread Peter Smith
Here are my review comments for v43-0001.

==

1. Commit message

The initial copy phase has no way to know the origin of the row data,
so if 'copy_data = true' in the step 4 below, log a warning to notify user
that potentially non-local data might have been copied.

1a.
"in the step 4" -> "in step 4"

~

1b.
"notify user" -> "notify the user"

==

2. doc/src/sgml/ref/create_subscription.sgml

+  
+   If the subscription is created with origin = none and
+   copy_data = true, it will check if the publisher has
+   subscribed to the same table from other publishers and, if so, log a
+   warning to notify the user to check the publisher tables. Before continuing
+   with other operations the user should check that publisher tables did not
+   have data with different origins to prevent data inconsistency issues on the
+   subscriber.
+  

I am not sure about that wording saying "to prevent data inconsistency
issues" because I thought maybe is already too late because any
unwanted origin data is already copied during the initial sync. I
think the user can do it try to clean up any problems caused before
things become even worse (??)

~~~

You asked for my thoughts (see [1] 6b):

> There is nothing much a user can do in this case. Only option would be
> to take a backup before the operation and restore it and then recreate
> the replication setup. I was not sure if we should document these
> steps as similar inconsistent data could get created without the
> origin option . Thoughts?

I think those cases are a bit different:

- For a normal scenario (i.e. origin=ANY) then inconsistent data just
refers to problems like maybe PK violations so I think you just get
what you get and have to deal with the errors...

- But when the user said origin=NONE they are specifically saying they
do NOT want any non-local data, yet they still might end up getting
data contrary to their wishes. So I think maybe there ought to be
something documented to warn about this potentially happening. My
first impression is that all this seems like a kind of terrible
problem because if it happens then cleanup could be difficult - if
that subscriber in turn was also a publisher then this bad data might
have propagated all over the place already! Anyway, I guess all this
could be mentioned in some (new ?) section of the
logical-replication.sgml file (i.e. patch 0002).

IDEA: I was wondering if the documentation should recommend (or maybe
we can even enforce this) that when using origin=NONE the user should
always create the subscription with enabled=FALSE. That way if there
are some warnings about potential bad origin data then there it might
be possible to take some action or change their mind *before* any
unwanted data pollutes everything.

==

3. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed

+ if (count == 0)
+ tablenames = makeStringInfo();
+ else
+ appendStringInfoString(tablenames, ", ");
+
+ appendStringInfo(tablenames, "%s.%s", nspname, relname);
+ count++;

I think the quotes are not correct - each separate table name should be quoted.

~~~

4.

+ /*
+ * There might be many tables present in this case, we will display a
+ * maximum of 100 tables followed by "..." to indicate there might be
+ * more tables.
+ */
+ if (count == 100)
+ {
+ appendStringInfoString(tablenames, ", ...");
+ break;
+ }

4a.
IMO this code should be part of the previous if/else

e.g.
if (count == 0) ... makeStringInfo()
else if (count > 100) ... append ellipsis "..." and break out of the loop
else ... append table name to the message

Doing it in this way means "..." definitely means there are more than
100 bad tables.

~

4b.
Changing like above (#4a) simplifies the comment because it removes
doubts like "might be…" since you already know you found the 101st
table.

SUGGESTION (comment)
The message displays a maximum of 100 tables having potentially
non-local data. Any more than that will be indicated by "...".

~

4c.
It still seems a bit disconcerting to me that there can be cases where
bad data was possibly copied but there is no record of what tables
have been polluted. Perhaps when the maximum bad tables are exceeded
then there can be some additional message to tell users what SQL they
can use to discover what all the other bad tables were.

~~~

5.

+ * XXX: For simplicity, we don't check whether the table has any data
+ * or not. If the table doesn't have any data then we don't need to
+ * distinguish between data having origin and data not having origin so
+ * we can avoid throwing a warning in that case.

"throwing a warning" -> "logging a warning"

~~~

6.

+ errdetail("Subscribed table(s) \"%s\" has been subscribed from some
other publisher.",
+   tablenames->data),

I don't think the table name list should be quoted in the errdetail,
because each of the individual table names should have been quoted.
See previous review comment #3.

==

7. src/test/subscription/t/030_origin.pl

+#

Re: Handle infinite recursion in logical replication setup

2022-08-30 Thread vignesh C
On Mon, 29 Aug 2022 at 12:01, Peter Smith  wrote:
>
> Here are some review comments for patch v42-0002:
>
> ==
>
> 1. doc/src/sgml/logical-replication.sgml
>
> copy_data = true 
>
> There are a couple of these tags where there is a trailing space
> before the . I guess it is doing no harm, but it is doing no
> good either, so IMO better to get rid of the space.

Modified

> ~~
>
> 2. doc/src/sgml/logical-replication.sgml - 31.3.1. Setting replication
> between two primaries
>
> User need to ensure that no data changes happen on table t1 on
> primary1 and primary2 until the setup is completed.
>
> SUGGESTION
> The user must ensure that no data changes happen on table t1 of
> primary1 and primary2 until the setup is completed.

Modified

> ~~~
>
> 3. doc/src/sgml/logical-replication.sgml - 31.3.2. Adding a new
> primary when there is no table data on any of the primaries
>
> User need to ensure that no data changes happen on table t1 on all the
> primaries primary1, primary2 and primary3 until the setup is
> completed.
>
> SUGGESTION
> The user must ensure that no data changes happen on table t1 of all
> the primaries primary1, primary2 and primary3 until the setup is
> completed.

Modified

> ~~~
>
> 4. doc/src/sgml/logical-replication.sgml - 31.3.3. Adding a new
> primary when table data is present on the existing primaries
>
> User need to ensure that no operations should be performed on table t1
> on primary2 and primary3 until the setup is completed.
>
> SUGGESTION
> The user must ensure that no operations are performed on table t1 of
> primary2 and primary3 until the setup is completed.
>
> Here also you changed the wording - "no data changes happen" versus
> "no operations should be performed". Was that a deliberate difference?
> Maybe they should all be consistent wording? If they are
> interchangeable IMO the "no operations are performed..." wording was
> the better of the two.

Modified, it was not a deliberate change. I have changed it to "no
operations are performed" in other places too.

> ~~~
>
> 5. doc/src/sgml/logical-replication.sgml - 31.3.4. Generic steps for
> adding a new primary to an existing set of primaries
>
> Step-2: User need to ensure that no changes happen on the required
> tables of the new primary until the setup is complete.
>
> SUGGESTION
> Step-2: The user must ensure that no changes happen on the required
> tables of the new primary until the setup is complete.

Modified

> ~~~
>
> 6.
>
> Step-4. User need to ensure that no changes happen on the required
> tables of the existing primaries except the first primary until the
> setup is complete.
>
> SUGGESTION
> Step-4. The user must ensure that no changes happen on the required
> tables of the existing primaries except the first primary until the
> setup is complete.

Modified

> ~~~
>
> 7. (the example)
>
> 7a.
> Step-2. User need to ensure that no changes happen on table t1 on
> primary4 until the setup is completed.
>
> SUGGESTION
> Step-2. The user must ensure that no changes happen on table t1 of
> primary4 until the setup is completed.

Modified

> ~
>
> 7b.
> Step-4. User need to ensure that no changes happen on table t1 on
> primary2 and primary3 until the setup is completed.
>
> SUGGESTION
> Step-4. The user must ensure that no changes happen on table t1 of
> primary2 and primary3 until the setup is completed.

Modified

Thanks for the comments, the changes for the same are available in the
v43 patch attached at [1].
[1] - 
https://www.postgresql.org/message-id/CALDaNm1V%2BAvzPsUcq%3DmNYG%2BJfAyovTWBe4vWKTknOgH_ko1E0Q%40mail.gmail.com

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-08-30 Thread vignesh C
On Mon, 29 Aug 2022 at 11:59, Peter Smith  wrote:
>
> Here are some review comments for patch v42-0001:
>
> ==
>
> 1. Commit message
>
> A later review comment below suggests some changes to the WARNING
> message so if those changes are made then the example in this commit
> message also needs to be modified.

Modified

> ==
>
> 2. doc/src/sgml/ref/alter_subscription.sgml - copy_data
>
> Refer to the Notes for the usage of true for copy_data parameter and
> its interaction with the origin parameter.
>
> I think saying "usage of true" sounded a bit strange.
>
> SUGGESTION
> Refer to the Notes about how copy_data = true can interact with the
> origin parameter.

Modified

> ==
>
> 3. doc/src/sgml/ref/create_subscription.sgml - copy data
>
> Refer to the Notes for the usage of true for copy_data parameter and
> its interaction with the origin parameter.
>
> SUGGESTION
> (same as #2)

Modified

> ~~
>
> 4. doc/src/sgml/ref/create_subscription.sgml - origin
>
> Refer to the Notes for the usage of true for copy_data parameter and
> its interaction with the origin parameter.
>
> SUGGESTION
> (same as #2)

Modified

> ~~~
>
> 5. doc/src/sgml/ref/create_subscription.sgml - Notes
>
> +  
> +   If the subscription is created with origin = none and
> +   copy_data = true, it will check if the publisher has
> +   subscribed to the same table from other publishers and, if so, throw a
> +   warning to notify user to check the publisher tables. The user can ensure
>
> "throw a warning to notify user" -> "log a warning to notify the user"

Modified

> ~~
>
> 6.
>
> +   warning to notify user to check the publisher tables. The user can ensure
> +   that publisher tables do not have data which has an origin associated 
> before
> +   continuing with any other operations to prevent inconsistent data being
> +   replicated.
> +  
>
> 6a.
>
> I'm not so sure about this. IMO the warning is not about the
> replication – really it is about the COPY which has already happened
> anyway. So the user can't really prevent anything from going wrong;
> instead, they have to take some action to clean up if anything did go
> wrong.
>
> SUGGESTION
> Before continuing with other operations the user should check that
> publisher tables did not have data with different origins, otherwise
> inconsistent data may have been copied.

Modified based on the suggestion provided by Amit.

> ~
>
> 6b.
>
> I am also wondering what can the user do now. Assuming there was bad
> COPY then the subscriber table has already got unwanted stuff copied
> into it. Is there any advice we can give to help users fix this mess?

There is nothing much a user can do in this case. Only option would be
to take a backup before the operation and restore it and then recreate
the replication setup. I was not sure if we should document these
steps as similar inconsistent data could get created without the
origin option . Thoughts?

> ==
>
> 7. src/backend/commands/subscriptioncmds.c - AlterSubscription_refresh
>
> + /* Check whether we can allow copy of newly added relations. */
> + check_pub_table_subscribed(wrconn, sub->publications, copy_data,
> +sub->origin, subrel_local_oids,
> +subrel_count);
>
> "whether we can allow" seems not quite the right wording here anymore,
> because now there is no ERROR stopping this - so if there was unwanted
> data the COPY will proceed to copy it anyhow...

Removed the comment as the function details the necessary things.

> ~~~
>
> 8. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed
>
> @@ -1781,6 +1794,121 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
>   table_close(rel, RowExclusiveLock);
>  }
>
> +/*
> + * Check and throw a warning if the publisher has subscribed to the same 
> table
> + * from some other publisher. This check is required only if "copy_data = 
> true"
> + * and "origin = none" for CREATE SUBSCRIPTION and
> + * ALTER SUBSCRIPTION ... REFRESH statements to notify user that data having
> + * origin might have been copied.
>
> "throw a warning" -> "log a warning"
>
> "to notify user" -> "to notify the user" ?

Modified

> ~~~
>
> 9.
>
> + if (copydata == false || !origin ||
> + (pg_strcasecmp(origin, LOGICALREP_ORIGIN_NONE) != 0))
> + return;
>
> SUGGESTION
> if (!copydata || !origin ||
> (pg_strcasecmp(origin, LOGICALREP_ORIGIN_NONE) != 0))

Modified

> ~~~
>
> 10. (Question)
>
> I can't tell just by reading the code if FOR ALL TABLES case is
> handled – e.g. will this recognise the case were the publisher might
> have table data from other origins because it has a subscription on
> some other node that was publishing "FOR ALL TABLES"?

Yes it handles it.

> ~~~
>
> 11.
>
> + ereport(WARNING,
> + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("publisher has subscribed table \"%s.%s\" from some other publisher",
> +nspname, relname),
> + errdetail("Publisher might have subscribed one or more tables from
> some other publisher."),
> + errhint("Verify

Re: Handle infinite recursion in logical replication setup

2022-08-29 Thread Amit Kapila
On Mon, Aug 29, 2022 at 11:59 AM Peter Smith  wrote:
>
> Here are some review comments for patch v42-0001:
>
> ~~
>
> 6.
>
> +   warning to notify user to check the publisher tables. The user can ensure
> +   that publisher tables do not have data which has an origin associated 
> before
> +   continuing with any other operations to prevent inconsistent data being
> +   replicated.
> +  
>
> 6a.
>
> I'm not so sure about this. IMO the warning is not about the
> replication – really it is about the COPY which has already happened
> anyway. So the user can't really prevent anything from going wrong;
> instead, they have to take some action to clean up if anything did go
> wrong.
>
> SUGGESTION
> Before continuing with other operations the user should check that
> publisher tables did not have data with different origins, otherwise
> inconsistent data may have been copied.
>
> ~

I would like to avoid the use of 'may' here. So, let's change it to
something like: "Before continuing with other operations the user
should check that publisher tables did not have data with different
origins to prevent data inconsistency issues on the subscriber."
>
> 6b.
>
> I am also wondering what can the user do now. Assuming there was bad
> COPY then the subscriber table has already got unwanted stuff copied
> into it. Is there any advice we can give to help users fix this mess?
>

I don't think the user can do much in that case. She will probably
need to re-create the replication.

>
> 11.
>
> + ereport(WARNING,
> + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("publisher has subscribed table \"%s.%s\" from some other publisher",
> +nspname, relname),
> + errdetail("Publisher might have subscribed one or more tables from
> some other publisher."),
> + errhint("Verify that these publisher tables do not have data that
> has an origin associated before proceeding to avoid inconsistency."));
> + break;
>
> 11a.
> I'm not sure having that "break" code is correct logic. Won't that
> mean the user will only get a warning for the first potential problem
> encountered but then other potential problems tables will have no
> warnings at all so the user may not be made aware of them? Perhaps you
> need to first gather the list of all the suspicious tables before
> logging a single warning which shows that list? Or perhaps you need to
> log multiple warnings – one warning per suspicious table?
>
> ~
>
> 11b.
> The WARNING message seems a bit inside-out because the errmsg is
> giving more details than the errdetail.
>
> SUGGESTIONS (or something similar)
> errmsg - "subscription XXX requested origin=NONE but may have copied
> data that had a different origin."
> errdetail – Publisher YYY has subscribed table \"%s.%s\" from some
> other publisher"
>

Your suggestion is better but we can't say 'copied' because the copy
may start afterwards by tablesync worker. Also, using 'may' is not
advisable in error messages. How about : "subscription XXX requested
origin=NONE but might copy data that had a different origin."?

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-08-29 Thread Amit Kapila
On Wed, Aug 24, 2022 at 7:27 PM vignesh C  wrote:
>
> Since there was no objections to change it to throw a warning, I have
> made the changes for the same.
>

Review comments for v42-0001*
==
1. Can we improve the query in check_pub_table_subscribed() so that it
doesn't fetch any table that is already part of the subscription on
the subscriber? This may be better than what the patch is doing which
is to first fetch such information and then skip it. If forming this
query turns out to be too complex then we can retain your method as
well but I feel it is worth trying to optimize the query used in the
patch.

2. I thought that it may be better to fetch all the tables that have
the possibility to have data from more than one origin but maybe the
list will be too long especially for FOR ALL TABLE types of cases. Is
this the reason you have decided to give a WARNING as soon as you see
any such table? If so, probably adding a comment for it can be good.

3.
+ $node_publisher->wait_for_catchup($sub_name);
+
+ # also wait for initial table sync to finish
+ $node_subscriber->poll_query_until('postgres', $synced_query)
+   or die "Timed out while waiting for subscriber to synchronize data";



+$node_B->safe_psql(
+ 'postgres', "
+ALTER SUBSCRIPTION $subname_BA REFRESH PUBLICATION");
+
+$node_B->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";

You can replace this and any similar code in the patch with a method
used in commit 0c20dd33db.

4.
+###
+# Join 3rd node (node_C) to the existing 2 nodes(node_A & node_B) bidirectional
+# replication setup when the existing nodes (node_A & node_B) has pre-existing
+# data and the new node (node_C) does not have any data.
+###
+$result = $node_A->safe_psql('postgres', "SELECT * FROM tab ORDER BY 1;");
+is($result, qq(), 'Check existing data');
+
+$result = $node_B->safe_psql('postgres', "SELECT * FROM tab ORDER BY 1;");
+is($result, qq(), 'Check existing data');
+
+$result = $node_C->safe_psql('postgres', "SELECT * FROM tab ORDER BY 1;");
+is($result, qq(), 'Check existing data');

The comments say that for this test, two of the nodes have some
pre-existing data but I don't see the same in the test results. The
test following this one will also have a similar effect. BTW, I am not
sure all the new three node tests added by this patch are required
because I don't see if they have additional code coverage or test
anything which is not tested without those. This has doubled the
amount of test timing for this test file which is okay if these tests
make any useful contribution but otherwise, it may be better to remove
these. Am, I missing something?

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-08-28 Thread Peter Smith
Here are some review comments for patch v42-0002:

==

1. doc/src/sgml/logical-replication.sgml

copy_data = true 

There are a couple of these tags where there is a trailing space
before the . I guess it is doing no harm, but it is doing no
good either, so IMO better to get rid of the space.

~~

2. doc/src/sgml/logical-replication.sgml - 31.3.1. Setting replication
between two primaries

User need to ensure that no data changes happen on table t1 on
primary1 and primary2 until the setup is completed.

SUGGESTION
The user must ensure that no data changes happen on table t1 of
primary1 and primary2 until the setup is completed.

~~~

3. doc/src/sgml/logical-replication.sgml - 31.3.2. Adding a new
primary when there is no table data on any of the primaries

User need to ensure that no data changes happen on table t1 on all the
primaries primary1, primary2 and primary3 until the setup is
completed.

SUGGESTION
The user must ensure that no data changes happen on table t1 of all
the primaries primary1, primary2 and primary3 until the setup is
completed.

~~~

4. doc/src/sgml/logical-replication.sgml - 31.3.3. Adding a new
primary when table data is present on the existing primaries

User need to ensure that no operations should be performed on table t1
on primary2 and primary3 until the setup is completed.

SUGGESTION
The user must ensure that no operations are performed on table t1 of
primary2 and primary3 until the setup is completed.

Here also you changed the wording - "no data changes happen" versus
"no operations should be performed". Was that a deliberate difference?
Maybe they should all be consistent wording? If they are
interchangeable IMO the "no operations are performed..." wording was
the better of the two.

~~~

5. doc/src/sgml/logical-replication.sgml - 31.3.4. Generic steps for
adding a new primary to an existing set of primaries

Step-2: User need to ensure that no changes happen on the required
tables of the new primary until the setup is complete.

SUGGESTION
Step-2: The user must ensure that no changes happen on the required
tables of the new primary until the setup is complete.

~~~

6.

Step-4. User need to ensure that no changes happen on the required
tables of the existing primaries except the first primary until the
setup is complete.

SUGGESTION
Step-4. The user must ensure that no changes happen on the required
tables of the existing primaries except the first primary until the
setup is complete.

~~~

7. (the example)

7a.
Step-2. User need to ensure that no changes happen on table t1 on
primary4 until the setup is completed.

SUGGESTION
Step-2. The user must ensure that no changes happen on table t1 of
primary4 until the setup is completed.

~

7b.
Step-4. User need to ensure that no changes happen on table t1 on
primary2 and primary3 until the setup is completed.

SUGGESTION
Step-4. The user must ensure that no changes happen on table t1 of
primary2 and primary3 until the setup is completed.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Handle infinite recursion in logical replication setup

2022-08-28 Thread Peter Smith
Here are some review comments for patch v42-0001:

==

1. Commit message

A later review comment below suggests some changes to the WARNING
message so if those changes are made then the example in this commit
message also needs to be modified.

==

2. doc/src/sgml/ref/alter_subscription.sgml - copy_data

Refer to the Notes for the usage of true for copy_data parameter and
its interaction with the origin parameter.

I think saying "usage of true" sounded a bit strange.

SUGGESTION
Refer to the Notes about how copy_data = true can interact with the
origin parameter.

==

3. doc/src/sgml/ref/create_subscription.sgml - copy data

Refer to the Notes for the usage of true for copy_data parameter and
its interaction with the origin parameter.

SUGGESTION
(same as #2)

~~

4. doc/src/sgml/ref/create_subscription.sgml - origin

Refer to the Notes for the usage of true for copy_data parameter and
its interaction with the origin parameter.

SUGGESTION
(same as #2)

~~~

5. doc/src/sgml/ref/create_subscription.sgml - Notes

+  
+   If the subscription is created with origin = none and
+   copy_data = true, it will check if the publisher has
+   subscribed to the same table from other publishers and, if so, throw a
+   warning to notify user to check the publisher tables. The user can ensure

"throw a warning to notify user" -> "log a warning to notify the user"

~~

6.

+   warning to notify user to check the publisher tables. The user can ensure
+   that publisher tables do not have data which has an origin associated before
+   continuing with any other operations to prevent inconsistent data being
+   replicated.
+  

6a.

I'm not so sure about this. IMO the warning is not about the
replication – really it is about the COPY which has already happened
anyway. So the user can't really prevent anything from going wrong;
instead, they have to take some action to clean up if anything did go
wrong.

SUGGESTION
Before continuing with other operations the user should check that
publisher tables did not have data with different origins, otherwise
inconsistent data may have been copied.

~

6b.

I am also wondering what can the user do now. Assuming there was bad
COPY then the subscriber table has already got unwanted stuff copied
into it. Is there any advice we can give to help users fix this mess?

==

7. src/backend/commands/subscriptioncmds.c - AlterSubscription_refresh

+ /* Check whether we can allow copy of newly added relations. */
+ check_pub_table_subscribed(wrconn, sub->publications, copy_data,
+sub->origin, subrel_local_oids,
+subrel_count);

"whether we can allow" seems not quite the right wording here anymore,
because now there is no ERROR stopping this - so if there was unwanted
data the COPY will proceed to copy it anyhow...

~~~

8. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed

@@ -1781,6 +1794,121 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
  table_close(rel, RowExclusiveLock);
 }

+/*
+ * Check and throw a warning if the publisher has subscribed to the same table
+ * from some other publisher. This check is required only if "copy_data = true"
+ * and "origin = none" for CREATE SUBSCRIPTION and
+ * ALTER SUBSCRIPTION ... REFRESH statements to notify user that data having
+ * origin might have been copied.

"throw a warning" -> "log a warning"

"to notify user" -> "to notify the user" ?

~~~

9.

+ if (copydata == false || !origin ||
+ (pg_strcasecmp(origin, LOGICALREP_ORIGIN_NONE) != 0))
+ return;

SUGGESTION
if (!copydata || !origin ||
(pg_strcasecmp(origin, LOGICALREP_ORIGIN_NONE) != 0))

~~~

10. (Question)

I can't tell just by reading the code if FOR ALL TABLES case is
handled – e.g. will this recognise the case were the publisher might
have table data from other origins because it has a subscription on
some other node that was publishing "FOR ALL TABLES"?

~~~

11.

+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("publisher has subscribed table \"%s.%s\" from some other publisher",
+nspname, relname),
+ errdetail("Publisher might have subscribed one or more tables from
some other publisher."),
+ errhint("Verify that these publisher tables do not have data that
has an origin associated before proceeding to avoid inconsistency."));
+ break;

11a.
I'm not sure having that "break" code is correct logic. Won't that
mean the user will only get a warning for the first potential problem
encountered but then other potential problems tables will have no
warnings at all so the user may not be made aware of them? Perhaps you
need to first gather the list of all the suspicious tables before
logging a single warning which shows that list? Or perhaps you need to
log multiple warnings – one warning per suspicious table?

~

11b.
The WARNING message seems a bit inside-out because the errmsg is
giving more details than the errdetail.

SUGGESTIONS (or something similar)
errmsg - "subscription XXX requested origin=NONE but m

Re: Handle infinite recursion in logical replication setup

2022-08-28 Thread Amit Kapila
On Mon, Aug 29, 2022 at 8:24 AM vignesh C  wrote:
>
> On Fri, Aug 26, 2022 at 9:52 AM Dilip Kumar  wrote:
> >
> > IMHO, since the user has specifically asked for origin=NONE but we do
> > not have any way to detect the origin during initial sync so I think
> > this could be documented and we can also issue the WARNING.  So that
> > users notice that part and carefully set up the replication.  OTOH, I
> > do not think that giving an error is very inconvenient because we are
> > already providing a new option "origin=NONE" so along with that lets
> > force the user to choose between copy_data=off or copy_data=force and
> > with that, there is no scope for mistakes.
>

Initially, I also thought that giving an error should be okay in this
case but later multiple people voted against that idea primarily
because we won't be able to detect whether the initial sync data
contains data from multiple origins. Consider that even though the
publisher is subscribed to some other publisher but it may not have
gotten any data from it by the time we try to subscribe from it. Also,
one might have removed the subscription that led to data from multiple
origins on the publisher, in this case, may be one can say that we can
consider that the data is now owned by the publisher but I am not
sure. So, considering this, I think giving a WARNING and documenting
how to set up replication correctly sounds like a reasonable way
forward.

> Since Jonathan also had suggested to throw a warning as in [1] and
> Hou-san also had suggested to throw a warning as in [2],
>

Agreed that multiple seems to be in favor of that approach. We can
always promote it to ERROR later if others and or users felt so.

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-08-28 Thread vignesh C
On Fri, Aug 26, 2022 at 9:52 AM Dilip Kumar  wrote:
>
> On Mon, Aug 22, 2022 at 9:19 AM houzj.f...@fujitsu.com
>  wrote:
> >
> > > Jonathan, Sawada-San, Hou-San, and others, what do you think is the best 
> > > way
> > > to move forward here?
> >
> > I think it's fine to throw a WARNING in this case given that there is a
> > chance of inconsistent data.
>
> IMHO, since the user has specifically asked for origin=NONE but we do
> not have any way to detect the origin during initial sync so I think
> this could be documented and we can also issue the WARNING.  So that
> users notice that part and carefully set up the replication.  OTOH, I
> do not think that giving an error is very inconvenient because we are
> already providing a new option "origin=NONE" so along with that lets
> force the user to choose between copy_data=off or copy_data=force and
> with that, there is no scope for mistakes.

Since Jonathan also had suggested to throw a warning as in [1] and
Hou-san also had suggested to throw a warning as in [2], I have made
changes to throw a warning and also documented the following contents
for the same:
+  
+   If the subscription is created with origin = none and
+   copy_data = true, it will check if the publisher has
+   subscribed to the same table from other publishers and, if so, throw a
+   warning to notify user to check the publisher tables. The user can ensure
+   that publisher tables do not have data which has an origin associated before
+   continuing with any other operations to prevent inconsistent data being
+   replicated.
+  

The changes for the same are available in the v42 patch at [3].
[1] - 
https://www.postgresql.org/message-id/afb653ba-e2b1-33a3-a54c-849f4466e1b4%40postgresql.org
[2] - 
https://www.postgresql.org/message-id/OS0PR01MB5716C383623ADAD64CE4841194719%40OS0PR01MB5716.jpnprd01.prod.outlook.com
[3] - 
https://www.postgresql.org/message-id/CALDaNm2oLWsSYtOFLdOkbrpC94%3DJZzKMCYDJoiZaqAX_Hn%2BU9Q%40mail.gmail.com

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-08-25 Thread Dilip Kumar
On Mon, Aug 22, 2022 at 9:19 AM houzj.f...@fujitsu.com
 wrote:
>
> > Jonathan, Sawada-San, Hou-San, and others, what do you think is the best way
> > to move forward here?
>
> I think it's fine to throw a WARNING in this case given that there is a
> chance of inconsistent data.

IMHO, since the user has specifically asked for origin=NONE but we do
not have any way to detect the origin during initial sync so I think
this could be documented and we can also issue the WARNING.  So that
users notice that part and carefully set up the replication.  OTOH, I
do not think that giving an error is very inconvenient because we are
already providing a new option "origin=NONE" so along with that lets
force the user to choose between copy_data=off or copy_data=force and
with that, there is no scope for mistakes.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Handle infinite recursion in logical replication setup

2022-08-24 Thread vignesh C
On Mon, Aug 22, 2022 at 9:19 AM houzj.f...@fujitsu.com
 wrote:
>
> On Thursday, August 18, 2022 11:13 AM Amit Kapila  
> wrote:
> >
> > On Wed, Aug 17, 2022 at 12:34 PM Peter Smith 
> > wrote:
> > >
> > > On Wed, Aug 17, 2022 at 4:33 PM Amit Kapila 
> > wrote:
> > > >
> > > > On Wed, Aug 17, 2022 at 8:48 AM houzj.f...@fujitsu.com
> > > >  wrote:
> > > > >
> > > > > On Tuesday, August 2, 2022 8:00 PM Amit Kapila
> >  wrote:
> > > > > > On Tue, Jul 26, 2022 at 9:07 AM Amit Kapila
> >  wrote:
> > > > >
> > > > > Thanks for the summary.
> > > > >
> > > > > I think it's fine to make the user use the copy_data option more
> > > > > carefully to prevent duplicate copies by reporting an ERROR.
> > > > >
> > > > > But I also have similar concern with Sawada-san as it's possible
> > > > > for user to receive an ERROR in some unexpected cases.
> > > > >
> > > > > For example I want to build bi-directional setup between two nodes:
> > > > >
> > > > > Node A: TABLE test (has actual data) Node B: TABLE test (empty)
> > > > >
> > > > > Step 1:
> > > > > CREATE PUBLICATION on both Node A and B.
> > > > >
> > > > > Step 2:
> > > > > CREATE SUBSCRIPTION on Node A with (copy_data = on)
> > > > > -- this is fine as there is no data on Node B
> > > > >
> > > > > Step 3:
> > > > > CREATE SUBSCRIPTION on Node B with (copy_data = on)
> > > > > -- this should be fine as user needs to copy data from Node A to
> > > > > Node B,
> > > > > -- but we still report an error for this case.
> > > > >
> > > > > It looks a bit strict to report an ERROR in this case and it seems
> > > > > not easy to avoid this. So, personally, I think it might be better
> > > > > to document the correct steps to build the bi-directional
> > > > > replication and probably also docuemnt the steps to recover if
> > > > > user accidently did duplicate initial copy if not documented yet.
> > > > >
> > > > > In addition, we could also LOG some additional information about
> > > > > the ORIGIN and initial copy which might help user to analyze if 
> > > > > needed.
> > > > >
> > > >
> > > > But why LOG instead of WARNING? I feel in this case there is a
> > > > chance of inconsistent data so a WARNING like "publication "pub1"
> > > > could have data from multiple origins" can be given when the user
> > > > has specified
> > > > options: "copy_data = on, origin = NONE" while creating a
> > > > subscription. We give a WARNING during subscription creation when
> > > > the corresponding publication doesn't exist, eg.
> > > >
> > > > postgres=# create subscription sub1 connection 'dbname = postgres'
> > > > publication pub1;
> > > > WARNING:  publication "pub1" does not exist in the publisher
> > > >
> > > > Then, we can explain in docs how users can avoid data
> > > > inconsistencies while setting up replication.
> > > >
> > >
> > > I was wondering if this copy/origin case really should be a NOTICE.
> > >
> >
> > We usually give NOTICE for some sort of additional implicit information, 
> > e.g.,
> > when we create a slot during CREATE SUBSCRIPTION
> > command: "NOTICE: created replication slot "sub1" on publisher". IMO, this 
> > is
> > likely to be a problem of data inconsistency so I think here we can choose
> > between WARNING and LOG. I prefer WARNING but okay with LOG as well if
> > others feel so. I think we can change this later as well if required. We do 
> > have an
> > option to not do anything and just document it but I feel it is better to 
> > give user
> > some indication of problem here because not everyone reads each update of
> > documentation.
> >
> > Jonathan, Sawada-San, Hou-San, and others, what do you think is the best way
> > to move forward here?
>
> I think it's fine to throw a WARNING in this case given that there is a
> chance of inconsistent data.

Since there was no objections to change it to throw a warning, I have
made the changes for the same.

I have made one change for the documentation patch.
The current steps states that:
1. Create a publication on primary3.
2. Lock table t1 on primary2 and primary3 in EXCLUSIVE mode until the
setup is completed. There is no need to lock table t1 on primary1
because any data changes made will be synchronized while creating the
subscription with copy_data = true.
3. Create a subscription on primary1 to subscribe to primary3.
4. Create a subscription on primary2 to subscribe to primary3.
5. Create a subscription on primary3 to subscribe to primary1. Use
copy_data = true so that the existing table data is copied during
initial sync.
6. Create a subscription on primary3 to subscribe to primary2. Use
copy_data = false because the initial table data would have been
already copied in the previous step.
7. Now the replication setup between primaries primary1, primary2 and
primary3 is complete. Incremental changes made on any primary will be
replicated to the other two primaries.

We found a problem with the proposed steps. Here the problem is that
we lock table t1 on primary2/primary3 in step2, then we create 

RE: Handle infinite recursion in logical replication setup

2022-08-21 Thread houzj.f...@fujitsu.com
On Thursday, August 18, 2022 11:13 AM Amit Kapila  
wrote:
> 
> On Wed, Aug 17, 2022 at 12:34 PM Peter Smith 
> wrote:
> >
> > On Wed, Aug 17, 2022 at 4:33 PM Amit Kapila 
> wrote:
> > >
> > > On Wed, Aug 17, 2022 at 8:48 AM houzj.f...@fujitsu.com
> > >  wrote:
> > > >
> > > > On Tuesday, August 2, 2022 8:00 PM Amit Kapila
>  wrote:
> > > > > On Tue, Jul 26, 2022 at 9:07 AM Amit Kapila
>  wrote:
> > > >
> > > > Thanks for the summary.
> > > >
> > > > I think it's fine to make the user use the copy_data option more
> > > > carefully to prevent duplicate copies by reporting an ERROR.
> > > >
> > > > But I also have similar concern with Sawada-san as it's possible
> > > > for user to receive an ERROR in some unexpected cases.
> > > >
> > > > For example I want to build bi-directional setup between two nodes:
> > > >
> > > > Node A: TABLE test (has actual data) Node B: TABLE test (empty)
> > > >
> > > > Step 1:
> > > > CREATE PUBLICATION on both Node A and B.
> > > >
> > > > Step 2:
> > > > CREATE SUBSCRIPTION on Node A with (copy_data = on)
> > > > -- this is fine as there is no data on Node B
> > > >
> > > > Step 3:
> > > > CREATE SUBSCRIPTION on Node B with (copy_data = on)
> > > > -- this should be fine as user needs to copy data from Node A to
> > > > Node B,
> > > > -- but we still report an error for this case.
> > > >
> > > > It looks a bit strict to report an ERROR in this case and it seems
> > > > not easy to avoid this. So, personally, I think it might be better
> > > > to document the correct steps to build the bi-directional
> > > > replication and probably also docuemnt the steps to recover if
> > > > user accidently did duplicate initial copy if not documented yet.
> > > >
> > > > In addition, we could also LOG some additional information about
> > > > the ORIGIN and initial copy which might help user to analyze if needed.
> > > >
> > >
> > > But why LOG instead of WARNING? I feel in this case there is a
> > > chance of inconsistent data so a WARNING like "publication "pub1"
> > > could have data from multiple origins" can be given when the user
> > > has specified
> > > options: "copy_data = on, origin = NONE" while creating a
> > > subscription. We give a WARNING during subscription creation when
> > > the corresponding publication doesn't exist, eg.
> > >
> > > postgres=# create subscription sub1 connection 'dbname = postgres'
> > > publication pub1;
> > > WARNING:  publication "pub1" does not exist in the publisher
> > >
> > > Then, we can explain in docs how users can avoid data
> > > inconsistencies while setting up replication.
> > >
> >
> > I was wondering if this copy/origin case really should be a NOTICE.
> >
> 
> We usually give NOTICE for some sort of additional implicit information, e.g.,
> when we create a slot during CREATE SUBSCRIPTION
> command: "NOTICE: created replication slot "sub1" on publisher". IMO, this is
> likely to be a problem of data inconsistency so I think here we can choose
> between WARNING and LOG. I prefer WARNING but okay with LOG as well if
> others feel so. I think we can change this later as well if required. We do 
> have an
> option to not do anything and just document it but I feel it is better to 
> give user
> some indication of problem here because not everyone reads each update of
> documentation.
> 
> Jonathan, Sawada-San, Hou-San, and others, what do you think is the best way
> to move forward here?

I think it's fine to throw a WARNING in this case given that there is a
chance of inconsistent data.

Best regards,
Hou zj



Re: Handle infinite recursion in logical replication setup

2022-08-17 Thread Amit Kapila
On Wed, Aug 17, 2022 at 12:34 PM Peter Smith  wrote:
>
> On Wed, Aug 17, 2022 at 4:33 PM Amit Kapila  wrote:
> >
> > On Wed, Aug 17, 2022 at 8:48 AM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > On Tuesday, August 2, 2022 8:00 PM Amit Kapila  
> > > wrote:
> > > > On Tue, Jul 26, 2022 at 9:07 AM Amit Kapila  
> > > > wrote:
> > >
> > > Thanks for the summary.
> > >
> > > I think it's fine to make the user use the copy_data option more 
> > > carefully to
> > > prevent duplicate copies by reporting an ERROR.
> > >
> > > But I also have similar concern with Sawada-san as it's possible for user 
> > > to
> > > receive an ERROR in some unexpected cases.
> > >
> > > For example I want to build bi-directional setup between two nodes:
> > >
> > > Node A: TABLE test (has actual data)
> > > Node B: TABLE test (empty)
> > >
> > > Step 1:
> > > CREATE PUBLICATION on both Node A and B.
> > >
> > > Step 2:
> > > CREATE SUBSCRIPTION on Node A with (copy_data = on)
> > > -- this is fine as there is no data on Node B
> > >
> > > Step 3:
> > > CREATE SUBSCRIPTION on Node B with (copy_data = on)
> > > -- this should be fine as user needs to copy data from Node A to Node B,
> > > -- but we still report an error for this case.
> > >
> > > It looks a bit strict to report an ERROR in this case and it seems not 
> > > easy to
> > > avoid this. So, personally, I think it might be better to document the 
> > > correct
> > > steps to build the bi-directional replication and probably also docuemnt 
> > > the
> > > steps to recover if user accidently did duplicate initial copy if not
> > > documented yet.
> > >
> > > In addition, we could also LOG some additional information about the 
> > > ORIGIN and
> > > initial copy which might help user to analyze if needed.
> > >
> >
> > But why LOG instead of WARNING? I feel in this case there is a chance
> > of inconsistent data so a WARNING like "publication "pub1" could have
> > data from multiple origins" can be given when the user has specified
> > options: "copy_data = on, origin = NONE" while creating a
> > subscription. We give a WARNING during subscription creation when the
> > corresponding publication doesn't exist, eg.
> >
> > postgres=# create subscription sub1 connection 'dbname = postgres'
> > publication pub1;
> > WARNING:  publication "pub1" does not exist in the publisher
> >
> > Then, we can explain in docs how users can avoid data inconsistencies
> > while setting up replication.
> >
>
> I was wondering if this copy/origin case really should be a NOTICE.
>

We usually give NOTICE for some sort of additional implicit
information, e.g., when we create a slot during CREATE SUBSCRIPTION
command: "NOTICE: created replication slot "sub1" on publisher". IMO,
this is likely to be a problem of data inconsistency so I think here
we can choose between WARNING and LOG. I prefer WARNING but okay with
LOG as well if others feel so. I think we can change this later as
well if required. We do have an option to not do anything and just
document it but I feel it is better to give user some indication of
problem here because not everyone reads each update of documentation.

Jonathan, Sawada-San, Hou-San, and others, what do you think is the
best way to move forward here?

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-08-17 Thread Peter Smith
On Wed, Aug 17, 2022 at 4:33 PM Amit Kapila  wrote:
>
> On Wed, Aug 17, 2022 at 8:48 AM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Tuesday, August 2, 2022 8:00 PM Amit Kapila  
> > wrote:
> > > On Tue, Jul 26, 2022 at 9:07 AM Amit Kapila  
> > > wrote:
> >
> > Thanks for the summary.
> >
> > I think it's fine to make the user use the copy_data option more carefully 
> > to
> > prevent duplicate copies by reporting an ERROR.
> >
> > But I also have similar concern with Sawada-san as it's possible for user to
> > receive an ERROR in some unexpected cases.
> >
> > For example I want to build bi-directional setup between two nodes:
> >
> > Node A: TABLE test (has actual data)
> > Node B: TABLE test (empty)
> >
> > Step 1:
> > CREATE PUBLICATION on both Node A and B.
> >
> > Step 2:
> > CREATE SUBSCRIPTION on Node A with (copy_data = on)
> > -- this is fine as there is no data on Node B
> >
> > Step 3:
> > CREATE SUBSCRIPTION on Node B with (copy_data = on)
> > -- this should be fine as user needs to copy data from Node A to Node B,
> > -- but we still report an error for this case.
> >
> > It looks a bit strict to report an ERROR in this case and it seems not easy 
> > to
> > avoid this. So, personally, I think it might be better to document the 
> > correct
> > steps to build the bi-directional replication and probably also docuemnt the
> > steps to recover if user accidently did duplicate initial copy if not
> > documented yet.
> >
> > In addition, we could also LOG some additional information about the ORIGIN 
> > and
> > initial copy which might help user to analyze if needed.
> >
>
> But why LOG instead of WARNING? I feel in this case there is a chance
> of inconsistent data so a WARNING like "publication "pub1" could have
> data from multiple origins" can be given when the user has specified
> options: "copy_data = on, origin = NONE" while creating a
> subscription. We give a WARNING during subscription creation when the
> corresponding publication doesn't exist, eg.
>
> postgres=# create subscription sub1 connection 'dbname = postgres'
> publication pub1;
> WARNING:  publication "pub1" does not exist in the publisher
>
> Then, we can explain in docs how users can avoid data inconsistencies
> while setting up replication.
>

I was wondering if this copy/origin case really should be a NOTICE.

See table [1]. It says WARNING is meant for "warnings of likey
problems". But this is not exactly a "likely" problem - IIUC we really
don't know if there is even any problem at all  we only know there
is the *potential* for a problem, but the user has to then judge it
for themselves, Perhaps WARNING may be a bit overkill for this
situation -  it might be unnecessarily scary to give false warnings.

OTOH, NOTICE [1] says it is for "information that might be helpful to
users" which seems more like what is needed here.

--
[1] 
https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-SEVERITY-LEVELS

Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: Handle infinite recursion in logical replication setup

2022-08-16 Thread Amit Kapila
On Wed, Aug 17, 2022 at 8:48 AM houzj.f...@fujitsu.com
 wrote:
>
> On Tuesday, August 2, 2022 8:00 PM Amit Kapila  
> wrote:
> > On Tue, Jul 26, 2022 at 9:07 AM Amit Kapila  wrote:
>
> Thanks for the summary.
>
> I think it's fine to make the user use the copy_data option more carefully to
> prevent duplicate copies by reporting an ERROR.
>
> But I also have similar concern with Sawada-san as it's possible for user to
> receive an ERROR in some unexpected cases.
>
> For example I want to build bi-directional setup between two nodes:
>
> Node A: TABLE test (has actual data)
> Node B: TABLE test (empty)
>
> Step 1:
> CREATE PUBLICATION on both Node A and B.
>
> Step 2:
> CREATE SUBSCRIPTION on Node A with (copy_data = on)
> -- this is fine as there is no data on Node B
>
> Step 3:
> CREATE SUBSCRIPTION on Node B with (copy_data = on)
> -- this should be fine as user needs to copy data from Node A to Node B,
> -- but we still report an error for this case.
>
> It looks a bit strict to report an ERROR in this case and it seems not easy to
> avoid this. So, personally, I think it might be better to document the correct
> steps to build the bi-directional replication and probably also docuemnt the
> steps to recover if user accidently did duplicate initial copy if not
> documented yet.
>
> In addition, we could also LOG some additional information about the ORIGIN 
> and
> initial copy which might help user to analyze if needed.
>

But why LOG instead of WARNING? I feel in this case there is a chance
of inconsistent data so a WARNING like "publication "pub1" could have
data from multiple origins" can be given when the user has specified
options: "copy_data = on, origin = NONE" while creating a
subscription. We give a WARNING during subscription creation when the
corresponding publication doesn't exist, eg.

postgres=# create subscription sub1 connection 'dbname = postgres'
publication pub1;
WARNING:  publication "pub1" does not exist in the publisher

Then, we can explain in docs how users can avoid data inconsistencies
while setting up replication.

-- 
With Regards,
Amit Kapila.




RE: Handle infinite recursion in logical replication setup

2022-08-16 Thread houzj.f...@fujitsu.com
On Tuesday, August 2, 2022 8:00 PM Amit Kapila  wrote:
> On Tue, Jul 26, 2022 at 9:07 AM Amit Kapila  wrote:
> >
> > On Tue, Jul 26, 2022 at 7:13 AM Jonathan S. Katz 
> wrote:
> > >
> > > Thanks for the example. I agree that it is fairly simple to reproduce.
> > >
> > > I understand that "copy_data = force" is meant to protect a user
> > > from hurting themself. I'm not convinced that this is the best way to do 
> > > so.
> > >
> > > For example today I can subscribe to multiple publications that
> > > write to the same table. If I have a primary key on that table, and
> > > two of the subscriptions try to write an identical ID, we conflict.
> > > We don't have any special flags or modes to guard against that from
> > > happening, though we do have documentation on conflicts and managing
> them.
> > >
> > > AFAICT the same issue with "copy_data" also exists in the above
> > > scenario too, even without the "origin" attribute.
> > >
> >
> > That's true but there is no parameter like origin = NONE which
> > indicates that constraint violations or duplicate data problems won't
> > occur due to replication. In the current case, I think the situation
> > is different because a user has specifically asked not to replicate
> > any remote data by specifying origin = NONE, which should be dealt
> > differently. Note that current users or their setup won't see any
> > difference/change unless they specify the new parameter origin as
> > NONE.
> >
> 
> Let me try to summarize the discussion so that it is easier for others to 
> follow.
> The work in this thread is to avoid loops, and duplicate data in logical
> replication when the operations happened on the same table in multiple nodes.
> It has been explained in email [1] with an example of how a logical 
> replication
> setup can lead to duplicate or inconsistent data.
> 
> The idea to solve this problem is that we don't replicate data that is not
> generated locally which we can normally identify based on origin of data in
> WAL. The commit 366283961a achieves that for replication but still the
> problem can happen during initial sync which is performed internally via copy.
> We can't differentiate the data in heap based on origin. So, we decided to
> prohibit the subscription operations that can perform initial sync (ex. Create
> Subscription, Alter Subscription ... Refresh) by detecting that the publisher 
> has
> subscribed to the same table from some other publisher.
> 
> To prohibit the subscription operations, the currently proposed patch throws
> an error.  Then, it also provides a new copy_data option 'force' under which
> the user will still be able to perform the operation. This could be useful 
> when
> the user intentionally wants to replicate the initial data even if it 
> contains data
> from multiple nodes (for example, when in a multi-node setup, one decides to
> get the initial data from just one node and then allow replication of data to
> proceed from each of respective nodes).
> 
> The other alternative discussed was to just give a warning for subscription
> operations and probably document the steps for users to avoid it. But the
> problem with that is once the user sees this warning, it won't be able to do
> anything except recreate the setup, so why not give an error in the first 
> place?
> 
> Thoughts?

Thanks for the summary.

I think it's fine to make the user use the copy_data option more carefully to
prevent duplicate copies by reporting an ERROR.

But I also have similar concern with Sawada-san as it's possible for user to
receive an ERROR in some unexpected cases.

For example I want to build bi-directional setup between two nodes:

Node A: TABLE test (has actual data)
Node B: TABLE test (empty)

Step 1:
CREATE PUBLICATION on both Node A and B.

Step 2:
CREATE SUBSCRIPTION on Node A with (copy_data = on)
-- this is fine as there is no data on Node B

Step 3:
CREATE SUBSCRIPTION on Node B with (copy_data = on)
-- this should be fine as user needs to copy data from Node A to Node B,
-- but we still report an error for this case.

It looks a bit strict to report an ERROR in this case and it seems not easy to
avoid this. So, personally, I think it might be better to document the correct
steps to build the bi-directional replication and probably also docuemnt the
steps to recover if user accidently did duplicate initial copy if not
documented yet.

In addition, we could also LOG some additional information about the ORIGIN and
initial copy which might help user to analyze if needed.


- Some other thoughts about the duplicate initial copy problem.

Actually, I feel the better way to address the possible duplicate copy problem
is to provide a command like "bi_setup" which can help user build the
bi-directional replication in all nodes and can handle the initial copy
automtically. But that might be too far.

Another naive idea I once thought is that maybe we can add a publication option
like: data_source_in_bi_group. If data_source_in_

Re: Handle infinite recursion in logical replication setup

2022-08-07 Thread vignesh C
On Fri, Jul 29, 2022 at 10:51 AM vignesh C  wrote:
>
> On Fri, Jul 29, 2022 at 8:31 AM Peter Smith  wrote:
> >
> > Here are some comments for the patch v40-0001:
> >
> > ==
> >
> > 1. Commit message
> >
> > It might be better to always use 'copy_data = true' in favour of
> > 'copy_data = on' just for consistency with all the docs and the error
> > messages.
> >
> > ==
>
> Modified
>
> > 2. doc/src/sgml/ref/create_subscription.sgml
> >
> > @@ -386,6 +401,15 @@ CREATE SUBSCRIPTION  > class="parameter">subscription_name > can have non-existent publications.
> >
> >
> > +  
> > +   If the subscription is created with origin = NONE and
> > +   copy_data = true, it will check if the publisher has
> > +   subscribed to the same table from other publishers and, if so, throw an
> > +   error to prevent possible non-local data from being copied. The user can
> > +   override this check and continue with the copy operation by specifying
> > +   copy_data = force.
> > +  
> >
> > 2a.
> > It is interesting that you changed the note to say origin = NONE.
> > Personally, I prefer it written as you did, but I think maybe this
> > change does not belong in this patch. The suggestion for changing from
> > "none" to NONE is being discussed elsewhere in this thread and
> > probably all such changes should be done together (if at all) as a
> > separate patch. Until then I think this patch 0001 should just stay
> > consistent with whatever is already pushed on HEAD.
>
> Modified
>
> > 2b.
> > "possible no-local data". Maybe the terminology "local/non-local" is a
> > hangover from back when the subscription parameter was called local
> > instead of origin. I'm not sure if you want to change this or not, and
> > anyway I didn't have any better suggestions – so this comment is just
> > to bring it to your attention.
> >
> > ==
>
> Modified
>
> > 3. src/backend/commands/subscriptioncmds.c - DefGetCopyData
> >
> > + /*
> > + * The set of strings accepted here should match up with the
> > + * grammar's opt_boolean_or_string production.
> > + */
> > + if (pg_strcasecmp(sval, "false") == 0 ||
> > + pg_strcasecmp(sval, "off") == 0)
> > + return COPY_DATA_OFF;
> > + if (pg_strcasecmp(sval, "true") == 0 ||
> > + pg_strcasecmp(sval, "on") == 0)
> > + return COPY_DATA_ON;
> > + if (pg_strcasecmp(sval, "force") == 0)
> > + return COPY_DATA_FORCE;
> >
> > I understand the intention of the comment, but it is not strictly
> > correct to say "should match up" because "force" is a new value.
> > Perhaps the comment should be as suggested below.
> >
> > SUGGESTION
> > The set of strings accepted here must include all those accepted by
> > the grammar's opt_boolean_or_string production.
> >
> > ~~
>
> Modified
>
> >
> > 4. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed
> >
> > @@ -1781,6 +1858,122 @@ AlterSubscriptionOwner_oid(Oid subid, Oid 
> > newOwnerId)
> >   table_close(rel, RowExclusiveLock);
> >  }
> >
> > +/*
> > + * Check and throw an error if the publisher has subscribed to the same 
> > table
> > + * from some other publisher. This check is required only if "copy_data = 
> > on"
> > + * and "origin = NONE" for CREATE SUBSCRIPTION and
> > + * ALTER SUBSCRIPTION ... REFRESH statements to avoid the publisher from
> > + * replicating data that has an origin.
> > + *
> > + * This check need not be performed on the tables that are already added as
> > + * incremental sync for such tables will happen through WAL and the origin 
> > of
> > + * the data can be identified from the WAL records.
> > + *
> > + * subrel_local_oids contains the list of relation oids that are already
> > + * present on the subscriber.
> > + */
> > +static void
> > +check_pub_table_subscribed(WalReceiverConn *wrconn, List *publications,
> > +CopyData copydata, char *origin,
> > +Oid *subrel_local_oids, int subrel_count)
> >
> > 4a.
> > "copy_data = on" -> "copy_data = true" (for consistency with the docs
> > and the error messages)
>
> Modified
>
> > 4b.
> > The same NONE/none review comment from #2a applies here too. Probably
> > it should be written as none for now unless/until *everything* changes
> > to NONE.
>
> Modified
>
> > 4c.
> > "to avoid the publisher from replicating data that has an origin." ->
> > "to avoid replicating data that has an origin."
>
> Modified
>
> > 4d.
> > + * This check need not be performed on the tables that are already added as
> > + * incremental sync for such tables will happen through WAL and the origin 
> > of
> > + * the data can be identified from the WAL records.
> >
> > SUGGESTION (maybe?)
> > This check need not be performed on the tables that are already added
> > because incremental sync for those tables will happen through WAL and
> > the origin of the data can be identified from the WAL records.
> >
> > ==
>
> Modified
>
> >
> > 5. src/test/subscription/t/030_origin.pl
> >
> > + "Refresh publication when the publisher has subscribed for the new table"
> >
> > SUGGESTION 

Re: Handle infinite recursion in logical replication setup

2022-08-04 Thread Amit Kapila
On Thu, Aug 4, 2022 at 6:31 PM Masahiko Sawada  wrote:
>
> On Tue, Aug 2, 2022 at 8:59 PM Amit Kapila  wrote:
> >
> > On Tue, Jul 26, 2022 at 9:07 AM Amit Kapila  wrote:
> >
> > Let me try to summarize the discussion so that it is easier for others
> > to follow. The work in this thread is to avoid loops, and duplicate
> > data in logical replication when the operations happened on the same
> > table in multiple nodes. It has been explained in email [1] with an
> > example of how a logical replication setup can lead to duplicate or
> > inconsistent data.
> >
> > The idea to solve this problem is that we don't replicate data that is
> > not generated locally which we can normally identify based on origin
> > of data in WAL. The commit 366283961a achieves that for replication
> > but still the problem can happen during initial sync which is
> > performed internally via copy. We can't differentiate the data in heap
> > based on origin. So, we decided to prohibit the subscription
> > operations that can perform initial sync (ex. Create Subscription,
> > Alter Subscription ... Refresh) by detecting that the publisher has
> > subscribed to the same table from some other publisher.
> >
> > To prohibit the subscription operations, the currently proposed patch
> > throws an error.  Then, it also provides a new copy_data option
> > 'force' under which the user will still be able to perform the
> > operation. This could be useful when the user intentionally wants to
> > replicate the initial data even if it contains data from multiple
> > nodes (for example, when in a multi-node setup, one decides to get the
> > initial data from just one node and then allow replication of data to
> > proceed from each of respective nodes).
> >
> > The other alternative discussed was to just give a warning for
> > subscription operations and probably document the steps for users to
> > avoid it. But the problem with that is once the user sees this
> > warning, it won't be able to do anything except recreate the setup, so
> > why not give an error in the first place?
> >
> > Thoughts?
>
> Thank you for the summary!
>
> I understand that this feature could help some cases, but I'm really
> not sure adding new value 'force' for them is worthwhile, and
> concerned it could reduce the usability.
>
> IIUC this feature would work only when origin = 'none' and copy_data =
> 'on', and the purpose is to prevent the data from being
> duplicated/conflicted by the initial table sync. But there are cases
> where duplication/conflict doesn't happen even if origin = 'none' and
> copy_data = 'on'. For instance, the table on the publisher might be
> empty.
>

Right, but if we want we can check the tables on publishers to ensure
that. Now, another case could be where the corresponding subscription
was disabled on publisher during create subscription but got enabled
just before copy, we can even catch that situation as we are doing for
column lists in fetch_remote_table_info(). Are there other cases of
false positives you can think of?  I see your point that we can
document that users should be careful with certain configurations to
avoid data inconsistency but not able completely convince myself about
the same. I think the main thing to decide here is how much we want to
ask users to be careful by referring them to docs. Now, if you are not
convinced with giving an ERROR here then we can probably show a
WARNING (that we might copy data for multiple origins during initial
sync in spite of the user having specified origin as NONE)?

>
 Also, even with origin = 'any', copy_data = 'on', there is a
> possibility of data duplication/conflict. Why do we need to address
> only the case where origin = 'none'?
>

Because the user has specifically asked not to replicate any remote
data by specifying origin = NONE, which should be dealt with
differently whereas 'any' doesn't have such a requirement. Now,
tomorrow, if we want to support replication based on specific origin
names say origin = 'node-1' then also we won't be able to identify the
data during initial sync but I think 'none' is a special case where
giving some intimation to user won't be a bad idea especially because
we can identify the same.

> I think that using origin =
> 'none' doesn't necessarily mean using bi-directional (or N-way)
> replication. Even when using uni-directional logical replication with
> two nodes, they may use origin = 'none'.
>

It is possible but still, I think it is a must for bi-directional (or
N-way) replication, otherwise, there is a risk of loops.

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-08-04 Thread Masahiko Sawada
On Tue, Aug 2, 2022 at 8:59 PM Amit Kapila  wrote:
>
> On Tue, Jul 26, 2022 at 9:07 AM Amit Kapila  wrote:
> >
> > On Tue, Jul 26, 2022 at 7:13 AM Jonathan S. Katz  
> > wrote:
> > >
> > > Thanks for the example. I agree that it is fairly simple to reproduce.
> > >
> > > I understand that "copy_data = force" is meant to protect a user from
> > > hurting themself. I'm not convinced that this is the best way to do so.
> > >
> > > For example today I can subscribe to multiple publications that write to
> > > the same table. If I have a primary key on that table, and two of the
> > > subscriptions try to write an identical ID, we conflict. We don't have
> > > any special flags or modes to guard against that from happening, though
> > > we do have documentation on conflicts and managing them.
> > >
> > > AFAICT the same issue with "copy_data" also exists in the above scenario
> > > too, even without the "origin" attribute.
> > >
> >
> > That's true but there is no parameter like origin = NONE which
> > indicates that constraint violations or duplicate data problems won't
> > occur due to replication. In the current case, I think the situation
> > is different because a user has specifically asked not to replicate
> > any remote data by specifying origin = NONE, which should be dealt
> > differently. Note that current users or their setup won't see any
> > difference/change unless they specify the new parameter origin as
> > NONE.
> >
>
> Let me try to summarize the discussion so that it is easier for others
> to follow. The work in this thread is to avoid loops, and duplicate
> data in logical replication when the operations happened on the same
> table in multiple nodes. It has been explained in email [1] with an
> example of how a logical replication setup can lead to duplicate or
> inconsistent data.
>
> The idea to solve this problem is that we don't replicate data that is
> not generated locally which we can normally identify based on origin
> of data in WAL. The commit 366283961a achieves that for replication
> but still the problem can happen during initial sync which is
> performed internally via copy. We can't differentiate the data in heap
> based on origin. So, we decided to prohibit the subscription
> operations that can perform initial sync (ex. Create Subscription,
> Alter Subscription ... Refresh) by detecting that the publisher has
> subscribed to the same table from some other publisher.
>
> To prohibit the subscription operations, the currently proposed patch
> throws an error.  Then, it also provides a new copy_data option
> 'force' under which the user will still be able to perform the
> operation. This could be useful when the user intentionally wants to
> replicate the initial data even if it contains data from multiple
> nodes (for example, when in a multi-node setup, one decides to get the
> initial data from just one node and then allow replication of data to
> proceed from each of respective nodes).
>
> The other alternative discussed was to just give a warning for
> subscription operations and probably document the steps for users to
> avoid it. But the problem with that is once the user sees this
> warning, it won't be able to do anything except recreate the setup, so
> why not give an error in the first place?
>
> Thoughts?

Thank you for the summary!

I understand that this feature could help some cases, but I'm really
not sure adding new value 'force' for them is worthwhile, and
concerned it could reduce the usability.

IIUC this feature would work only when origin = 'none' and copy_data =
'on', and the purpose is to prevent the data from being
duplicated/conflicted by the initial table sync. But there are cases
where duplication/conflict doesn't happen even if origin = 'none' and
copy_data = 'on'. For instance, the table on the publisher might be
empty. Also, even with origin = 'any', copy_data = 'on', there is a
possibility of data duplication/conflict. Why do we need to address
only the case where origin = 'none'? I think that using origin =
'none' doesn't necessarily mean using bi-directional (or N-way)
replication. Even when using uni-directional logical replication with
two nodes, they may use origin = 'none'. Therefore, it seems to me
that this feature works only for a narrow situation and has false
positives.

Since it has been the user's responsibility not to try to make the
data inconsistent by the initial table sync, I think that it might be
sufficient if we note the risk in the documentation.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Handle infinite recursion in logical replication setup

2022-08-02 Thread Amit Kapila
On Tue, Jul 26, 2022 at 9:07 AM Amit Kapila  wrote:
>
> On Tue, Jul 26, 2022 at 7:13 AM Jonathan S. Katz  wrote:
> >
> > Thanks for the example. I agree that it is fairly simple to reproduce.
> >
> > I understand that "copy_data = force" is meant to protect a user from
> > hurting themself. I'm not convinced that this is the best way to do so.
> >
> > For example today I can subscribe to multiple publications that write to
> > the same table. If I have a primary key on that table, and two of the
> > subscriptions try to write an identical ID, we conflict. We don't have
> > any special flags or modes to guard against that from happening, though
> > we do have documentation on conflicts and managing them.
> >
> > AFAICT the same issue with "copy_data" also exists in the above scenario
> > too, even without the "origin" attribute.
> >
>
> That's true but there is no parameter like origin = NONE which
> indicates that constraint violations or duplicate data problems won't
> occur due to replication. In the current case, I think the situation
> is different because a user has specifically asked not to replicate
> any remote data by specifying origin = NONE, which should be dealt
> differently. Note that current users or their setup won't see any
> difference/change unless they specify the new parameter origin as
> NONE.
>

Let me try to summarize the discussion so that it is easier for others
to follow. The work in this thread is to avoid loops, and duplicate
data in logical replication when the operations happened on the same
table in multiple nodes. It has been explained in email [1] with an
example of how a logical replication setup can lead to duplicate or
inconsistent data.

The idea to solve this problem is that we don't replicate data that is
not generated locally which we can normally identify based on origin
of data in WAL. The commit 366283961a achieves that for replication
but still the problem can happen during initial sync which is
performed internally via copy. We can't differentiate the data in heap
based on origin. So, we decided to prohibit the subscription
operations that can perform initial sync (ex. Create Subscription,
Alter Subscription ... Refresh) by detecting that the publisher has
subscribed to the same table from some other publisher.

To prohibit the subscription operations, the currently proposed patch
throws an error.  Then, it also provides a new copy_data option
'force' under which the user will still be able to perform the
operation. This could be useful when the user intentionally wants to
replicate the initial data even if it contains data from multiple
nodes (for example, when in a multi-node setup, one decides to get the
initial data from just one node and then allow replication of data to
proceed from each of respective nodes).

The other alternative discussed was to just give a warning for
subscription operations and probably document the steps for users to
avoid it. But the problem with that is once the user sees this
warning, it won't be able to do anything except recreate the setup, so
why not give an error in the first place?

Thoughts?

[1] - 
https://www.postgresql.org/message-id/CALDaNm1eJr6qXT9esVPzgc5Qvy4uMhV4kCCTSmxARKjf%2BMwcnw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




RE: Handle infinite recursion in logical replication setup

2022-08-02 Thread shiy.f...@fujitsu.com
On Fri, Jul 29, 2022 1:22 PM vignesh C  wrote:
> 
> On Fri, Jul 29, 2022 at 8:31 AM Peter Smith 
> wrote:
> >
> 
> Thanks for the comments, the attached v41 patch has the changes for the
> same.
> 

Thanks for updating the patch.

A comment for 0002 patch.

In the example in section 31.11.4 (Generic steps for adding a new primary to an
existing set of primaries), I think there should be a Step-6, corresponding to
the steps mentioned before. And part of Step-5 should actually be part of
Step-6.

Regards,
Shi yu


Re: Handle infinite recursion in logical replication setup

2022-08-01 Thread Peter Smith
On Mon, Aug 1, 2022 at 6:52 PM Amit Kapila  wrote:
>
> On Mon, Aug 1, 2022 at 1:32 PM Peter Smith  wrote:
> >
> > On Mon, Aug 1, 2022 at 3:27 PM shiy.f...@fujitsu.com
> >  wrote:
> > >
> > > On Fri, Jul 29, 2022 1:22 PM vignesh C  wrote:
> > > >
> > > >
> > > > Thanks for the comments, the attached v41 patch has the changes for the
> > > > same.
> > > >
> > >
> > > Thanks for updating the patch.
> > >
> > > I wonder in the case that the publisher uses PG15 (or before), subscriber 
> > > uses
> > > PG16, should we have this check (check if publication tables were also
> > > subscribing from other publishers)? In this case, even if origin=none is
> > > specified, it doesn't work because the publisher doesn't filter the 
> > > origin. So
> > > maybe we don't need the check for initial sync. Thoughts?
> > >
> >
> > IIUC for the scenario you've described (subscription origin=none and
> > publisher < PG16) the subscriber can end up getting extra data they
> > did not want, right?
> >
>
> Yes, because publishers won't have 'filtering based on origin' functionality.
>
> > So instead of just "don't need the check", maybe this combination
> > should throw ERROR, or at least a log a WARNING?
> >
>
> I am not sure if doing anything (ERROR or WARNING) would make sense
> because anyway later during replication there won't be any filtering.
>

I was suggesting stopping that replication from happening at all. If
the user specifically asked for 'origin=none' but the publisher could
not filter that (because < PG16) then I imagined some logic that would
just disable the subscription up-front. Isn't it preferable for the
subscriber to get no data at all then to get data the user
specifically said they did NOT want to get?

e.g. pseudo-code for the worker code something like below:

if (origin != ANY and publisher.server_version < PG16)
{
set subscription.option.disable_on_error = true;
throw ERROR ("publisher does not support origin=none - disabling
the subscription");
}

--
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: Handle infinite recursion in logical replication setup

2022-08-01 Thread Amit Kapila
On Mon, Aug 1, 2022 at 1:32 PM Peter Smith  wrote:
>
> On Mon, Aug 1, 2022 at 3:27 PM shiy.f...@fujitsu.com
>  wrote:
> >
> > On Fri, Jul 29, 2022 1:22 PM vignesh C  wrote:
> > >
> > >
> > > Thanks for the comments, the attached v41 patch has the changes for the
> > > same.
> > >
> >
> > Thanks for updating the patch.
> >
> > I wonder in the case that the publisher uses PG15 (or before), subscriber 
> > uses
> > PG16, should we have this check (check if publication tables were also
> > subscribing from other publishers)? In this case, even if origin=none is
> > specified, it doesn't work because the publisher doesn't filter the origin. 
> > So
> > maybe we don't need the check for initial sync. Thoughts?
> >
>
> IIUC for the scenario you've described (subscription origin=none and
> publisher < PG16) the subscriber can end up getting extra data they
> did not want, right?
>

Yes, because publishers won't have 'filtering based on origin' functionality.

> So instead of just "don't need the check", maybe this combination
> should throw ERROR, or at least a log a WARNING?
>

I am not sure if doing anything (ERROR or WARNING) would make sense
because anyway later during replication there won't be any filtering.

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-08-01 Thread Peter Smith
On Mon, Aug 1, 2022 at 3:27 PM shiy.f...@fujitsu.com
 wrote:
>
> On Fri, Jul 29, 2022 1:22 PM vignesh C  wrote:
> >
> >
> > Thanks for the comments, the attached v41 patch has the changes for the
> > same.
> >
>
> Thanks for updating the patch.
>
> I wonder in the case that the publisher uses PG15 (or before), subscriber uses
> PG16, should we have this check (check if publication tables were also
> subscribing from other publishers)? In this case, even if origin=none is
> specified, it doesn't work because the publisher doesn't filter the origin. So
> maybe we don't need the check for initial sync. Thoughts?
>

IIUC for the scenario you've described (subscription origin=none and
publisher < PG16) the subscriber can end up getting extra data they
did not want, right?

So instead of just "don't need the check", maybe this combination
should throw ERROR, or at least a log a WARNING?

--
KInd Regards,
Peter Smith.
Fujitsu Australia




RE: Handle infinite recursion in logical replication setup

2022-07-31 Thread shiy.f...@fujitsu.com
On Fri, Jul 29, 2022 1:22 PM vignesh C  wrote:
> 
> 
> Thanks for the comments, the attached v41 patch has the changes for the
> same.
> 

Thanks for updating the patch.

I wonder in the case that the publisher uses PG15 (or before), subscriber uses
PG16, should we have this check (check if publication tables were also
subscribing from other publishers)? In this case, even if origin=none is
specified, it doesn't work because the publisher doesn't filter the origin. So
maybe we don't need the check for initial sync. Thoughts?

Regards,
Shi yu


Re: Handle infinite recursion in logical replication setup

2022-07-28 Thread vignesh C
On Fri, Jul 29, 2022 at 8:31 AM Peter Smith  wrote:
>
> Here are some comments for the patch v40-0001:
>
> ==
>
> 1. Commit message
>
> It might be better to always use 'copy_data = true' in favour of
> 'copy_data = on' just for consistency with all the docs and the error
> messages.
>
> ==

Modified

> 2. doc/src/sgml/ref/create_subscription.sgml
>
> @@ -386,6 +401,15 @@ CREATE SUBSCRIPTION  class="parameter">subscription_name can have non-existent publications.
>
>
> +  
> +   If the subscription is created with origin = NONE and
> +   copy_data = true, it will check if the publisher has
> +   subscribed to the same table from other publishers and, if so, throw an
> +   error to prevent possible non-local data from being copied. The user can
> +   override this check and continue with the copy operation by specifying
> +   copy_data = force.
> +  
>
> 2a.
> It is interesting that you changed the note to say origin = NONE.
> Personally, I prefer it written as you did, but I think maybe this
> change does not belong in this patch. The suggestion for changing from
> "none" to NONE is being discussed elsewhere in this thread and
> probably all such changes should be done together (if at all) as a
> separate patch. Until then I think this patch 0001 should just stay
> consistent with whatever is already pushed on HEAD.

Modified

> 2b.
> "possible no-local data". Maybe the terminology "local/non-local" is a
> hangover from back when the subscription parameter was called local
> instead of origin. I'm not sure if you want to change this or not, and
> anyway I didn't have any better suggestions – so this comment is just
> to bring it to your attention.
>
> ==

Modified

> 3. src/backend/commands/subscriptioncmds.c - DefGetCopyData
>
> + /*
> + * The set of strings accepted here should match up with the
> + * grammar's opt_boolean_or_string production.
> + */
> + if (pg_strcasecmp(sval, "false") == 0 ||
> + pg_strcasecmp(sval, "off") == 0)
> + return COPY_DATA_OFF;
> + if (pg_strcasecmp(sval, "true") == 0 ||
> + pg_strcasecmp(sval, "on") == 0)
> + return COPY_DATA_ON;
> + if (pg_strcasecmp(sval, "force") == 0)
> + return COPY_DATA_FORCE;
>
> I understand the intention of the comment, but it is not strictly
> correct to say "should match up" because "force" is a new value.
> Perhaps the comment should be as suggested below.
>
> SUGGESTION
> The set of strings accepted here must include all those accepted by
> the grammar's opt_boolean_or_string production.
>
> ~~

Modified

>
> 4. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed
>
> @@ -1781,6 +1858,122 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
>   table_close(rel, RowExclusiveLock);
>  }
>
> +/*
> + * Check and throw an error if the publisher has subscribed to the same table
> + * from some other publisher. This check is required only if "copy_data = on"
> + * and "origin = NONE" for CREATE SUBSCRIPTION and
> + * ALTER SUBSCRIPTION ... REFRESH statements to avoid the publisher from
> + * replicating data that has an origin.
> + *
> + * This check need not be performed on the tables that are already added as
> + * incremental sync for such tables will happen through WAL and the origin of
> + * the data can be identified from the WAL records.
> + *
> + * subrel_local_oids contains the list of relation oids that are already
> + * present on the subscriber.
> + */
> +static void
> +check_pub_table_subscribed(WalReceiverConn *wrconn, List *publications,
> +CopyData copydata, char *origin,
> +Oid *subrel_local_oids, int subrel_count)
>
> 4a.
> "copy_data = on" -> "copy_data = true" (for consistency with the docs
> and the error messages)

Modified

> 4b.
> The same NONE/none review comment from #2a applies here too. Probably
> it should be written as none for now unless/until *everything* changes
> to NONE.

Modified

> 4c.
> "to avoid the publisher from replicating data that has an origin." ->
> "to avoid replicating data that has an origin."

Modified

> 4d.
> + * This check need not be performed on the tables that are already added as
> + * incremental sync for such tables will happen through WAL and the origin of
> + * the data can be identified from the WAL records.
>
> SUGGESTION (maybe?)
> This check need not be performed on the tables that are already added
> because incremental sync for those tables will happen through WAL and
> the origin of the data can be identified from the WAL records.
>
> ==

Modified

>
> 5. src/test/subscription/t/030_origin.pl
>
> + "Refresh publication when the publisher has subscribed for the new table"
>
> SUGGESTION (Just to mention origin = none somehow. Maybe you can
> reword it better than this)
> Refresh publication when the publisher has subscribed for the new
> table, but the subscriber-side wants origin=none

Modified

Thanks for the comments, the attached v41 patch has the changes for the same.

Regards,
Vignesh
From de95301eed4dec2a76e40effb9c8b8132c2c7e5

Re: Handle infinite recursion in logical replication setup

2022-07-28 Thread Peter Smith
Here are some comments for the patch v40-0001:

==

1. Commit message

It might be better to always use 'copy_data = true' in favour of
'copy_data = on' just for consistency with all the docs and the error
messages.

==

2. doc/src/sgml/ref/create_subscription.sgml

@@ -386,6 +401,15 @@ CREATE SUBSCRIPTION subscription_name

+  
+   If the subscription is created with origin = NONE and
+   copy_data = true, it will check if the publisher has
+   subscribed to the same table from other publishers and, if so, throw an
+   error to prevent possible non-local data from being copied. The user can
+   override this check and continue with the copy operation by specifying
+   copy_data = force.
+  

2a.
It is interesting that you changed the note to say origin = NONE.
Personally, I prefer it written as you did, but I think maybe this
change does not belong in this patch. The suggestion for changing from
"none" to NONE is being discussed elsewhere in this thread and
probably all such changes should be done together (if at all) as a
separate patch. Until then I think this patch 0001 should just stay
consistent with whatever is already pushed on HEAD.

2b.
"possible no-local data". Maybe the terminology "local/non-local" is a
hangover from back when the subscription parameter was called local
instead of origin. I'm not sure if you want to change this or not, and
anyway I didn't have any better suggestions – so this comment is just
to bring it to your attention.

==

3. src/backend/commands/subscriptioncmds.c - DefGetCopyData

+ /*
+ * The set of strings accepted here should match up with the
+ * grammar's opt_boolean_or_string production.
+ */
+ if (pg_strcasecmp(sval, "false") == 0 ||
+ pg_strcasecmp(sval, "off") == 0)
+ return COPY_DATA_OFF;
+ if (pg_strcasecmp(sval, "true") == 0 ||
+ pg_strcasecmp(sval, "on") == 0)
+ return COPY_DATA_ON;
+ if (pg_strcasecmp(sval, "force") == 0)
+ return COPY_DATA_FORCE;

I understand the intention of the comment, but it is not strictly
correct to say "should match up" because "force" is a new value.
Perhaps the comment should be as suggested below.

SUGGESTION
The set of strings accepted here must include all those accepted by
the grammar's opt_boolean_or_string production.

~~~

4. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed

@@ -1781,6 +1858,122 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
  table_close(rel, RowExclusiveLock);
 }

+/*
+ * Check and throw an error if the publisher has subscribed to the same table
+ * from some other publisher. This check is required only if "copy_data = on"
+ * and "origin = NONE" for CREATE SUBSCRIPTION and
+ * ALTER SUBSCRIPTION ... REFRESH statements to avoid the publisher from
+ * replicating data that has an origin.
+ *
+ * This check need not be performed on the tables that are already added as
+ * incremental sync for such tables will happen through WAL and the origin of
+ * the data can be identified from the WAL records.
+ *
+ * subrel_local_oids contains the list of relation oids that are already
+ * present on the subscriber.
+ */
+static void
+check_pub_table_subscribed(WalReceiverConn *wrconn, List *publications,
+CopyData copydata, char *origin,
+Oid *subrel_local_oids, int subrel_count)

4a.
"copy_data = on" -> "copy_data = true" (for consistency with the docs
and the error messages)

4b.
The same NONE/none review comment from #2a applies here too. Probably
it should be written as none for now unless/until *everything* changes
to NONE.

4c.
"to avoid the publisher from replicating data that has an origin." ->
"to avoid replicating data that has an origin."

4d.
+ * This check need not be performed on the tables that are already added as
+ * incremental sync for such tables will happen through WAL and the origin of
+ * the data can be identified from the WAL records.

SUGGESTION (maybe?)
This check need not be performed on the tables that are already added
because incremental sync for those tables will happen through WAL and
the origin of the data can be identified from the WAL records.

==

5. src/test/subscription/t/030_origin.pl

+ "Refresh publication when the publisher has subscribed for the new table"

SUGGESTION (Just to mention origin = none somehow. Maybe you can
reword it better than this)
Refresh publication when the publisher has subscribed for the new
table, but the subscriber-side wants origin=none

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Handle infinite recursion in logical replication setup

2022-07-28 Thread vignesh C
On Thu, Jul 28, 2022 at 11:28 AM Peter Smith  wrote:
>
> Hi Vignesh.
>
> FYI the v39* patch fails to apply [1]. Can you please rebase it?
>
>
> [1]
> === Applying patches on top of PostgreSQL commit ID
> 5f858dd3bebd1f3845aef2bff7f4345bfb7b74b3 ===
> === applying patch
> ./v39-0001-Check-and-throw-an-error-if-publication-tables-w.patch
> patching file doc/src/sgml/ref/alter_subscription.sgml
> patching file doc/src/sgml/ref/create_subscription.sgml
> patching file src/backend/commands/subscriptioncmds.c
> Hunk #10 FAILED at 886.
> 1 out of 14 hunks FAILED -- saving rejects to file
> src/backend/commands/subscriptioncmds.c.rej
> patching file src/test/regress/expected/subscription.out
> patching file src/test/regress/sql/subscription.sql
> patching file src/test/subscription/t/030_origin.pl
> patching file src/tools/pgindent/typedefs.list
>
> --

Please find the v40 patch attached which is rebased on top of head.

Regards,
Vignesh
From e59c864801c066aff069deb528427788bd0cff0a Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Thu, 28 Jul 2022 19:11:35 +0530
Subject: [PATCH v40 1/2] Check and throw an error if publication tables were
 also subscribing from other publishers and support force value for copy_data
 parameter.

This patch does a couple of things:
1) Checks and throws an error if 'copy_data = on' and 'origin =
none' but the publication tables were also replicated from other publishers.
2) Adds 'force' value for copy_data parameter.

---
The steps below help to demonstrate how the new exception is useful:

The initial copy phase has no way to know the origin of the row data,
so if 'copy_data = on' in the step 4 below, then an error will be
thrown to prevent any potentially non-local data from being copied:

e.g.
CREATE SUBSCRIPTION sub_node2_node1 CONNECTION ''
PUBLICATION pub_node1 WITH (copy_data = on, origin = none);
ERROR:  CREATE/ALTER SUBSCRIPTION with origin = none and copy_data = on is
not allowed when the publisher might have replicated data.

---
The following steps help to demonstrate how the 'copy_data = force'
change will be useful:

Let's take a scenario where the user wants to set up bidirectional
logical replication between node1 and node2 where the same table on
node1 has pre-existing data and node2 has no pre-existing data.

e.g.
node1: Table t1 (c1 int) has data 11, 12, 13, 14
node2: Table t1 (c1 int) has no pre-existing data

The following steps are required in this case:
step 1:
node1=# CREATE PUBLICATION pub_node1 FOR TABLE t1;
CREATE PUBLICATION

step 2:
node2=# CREATE PUBLICATION pub_node2 FOR TABLE t1;
CREATE PUBLICATION

step 3:
node1=# CREATE SUBSCRIPTION sub_node1_node2 CONNECTION ''
node1-# PUBLICATION pub_node2;
CREATE SUBSCRIPTION

step 4:
node2=# CREATE SUBSCRIPTION sub_node2_node1 CONNECTION ''
node2-# PUBLICATION pub_node1;
CREATE SUBSCRIPTION

After the subscription is created on node2, node1 will be synced to
node2 and the newly synced data will be sent to node2. This process of
node1 sending data to node2 and node2 sending data to node1 will repeat
infinitely. If table t1 has a unique key, this will lead to a unique key
violation and replication won't proceed.

This problem can be avoided by using origin and copy_data parameters as given
below:
Step 1 & Step 2 are same as above.

step 3: Create a subscription on node1 to subscribe to node2:
node1=# CREATE SUBSCRIPTION sub_node1_node2 CONNECTION ''
node1-# PUBLICATION pub_node2 WITH (copy_data = off, origin = none);
CREATE SUBSCRIPTION

step 4: Create a subscription on node2 to subscribe to node1. Use
'copy_data = force' when creating a subscription to node1 so that the
existing table data is copied during initial sync:
node2=# CREATE SUBSCRIPTION sub_node2_node1 CONNECTION ''
node2-# PUBLICATION pub_node1 WITH (copy_data = force, origin = none);
CREATE SUBSCRIPTION

Author: Vignesh C
Reviewed-By: Peter Smith, Amit Kapila, Jonathan Katz, Shi yu, Wang wei
Discussion: https://www.postgresql.org/message-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubh...@mail.gmail.com
---
 doc/src/sgml/ref/alter_subscription.sgml   |  14 +-
 doc/src/sgml/ref/create_subscription.sgml  |  32 +-
 src/backend/commands/subscriptioncmds.c| 215 +++-
 src/test/regress/expected/subscription.out |  22 +-
 src/test/regress/sql/subscription.sql  |  14 +
 src/test/subscription/t/030_origin.pl  | 382 ++---
 src/tools/pgindent/typedefs.list   |   1 +
 7 files changed, 613 insertions(+), 67 deletions(-)

diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 64efc21f53..f4fb9c5282 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -161,12 +161,20 @@ ALTER SUBSCRIPTION name RENAME TO <
 
   

-copy_data (boolean)
+copy_d

Re: Handle infinite recursion in logical replication setup

2022-07-28 Thread vignesh C
On Thu, Jul 28, 2022 at 11:28 AM Peter Smith  wrote:
>
> Hi Vignesh.
>
> FYI the v39* patch fails to apply [1]. Can you please rebase it?
>
>
> [1]
> === Applying patches on top of PostgreSQL commit ID
> 5f858dd3bebd1f3845aef2bff7f4345bfb7b74b3 ===
> === applying patch
> ./v39-0001-Check-and-throw-an-error-if-publication-tables-w.patch
> patching file doc/src/sgml/ref/alter_subscription.sgml
> patching file doc/src/sgml/ref/create_subscription.sgml
> patching file src/backend/commands/subscriptioncmds.c
> Hunk #10 FAILED at 886.
> 1 out of 14 hunks FAILED -- saving rejects to file
> src/backend/commands/subscriptioncmds.c.rej
> patching file src/test/regress/expected/subscription.out
> patching file src/test/regress/sql/subscription.sql
> patching file src/test/subscription/t/030_origin.pl
> patching file src/tools/pgindent/typedefs.list

Thanks for reporting this, I will post an updated version for this soon.

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-07-27 Thread Peter Smith
Hi Vignesh.

FYI the v39* patch fails to apply [1]. Can you please rebase it?


[1]
=== Applying patches on top of PostgreSQL commit ID
5f858dd3bebd1f3845aef2bff7f4345bfb7b74b3 ===
=== applying patch
./v39-0001-Check-and-throw-an-error-if-publication-tables-w.patch
patching file doc/src/sgml/ref/alter_subscription.sgml
patching file doc/src/sgml/ref/create_subscription.sgml
patching file src/backend/commands/subscriptioncmds.c
Hunk #10 FAILED at 886.
1 out of 14 hunks FAILED -- saving rejects to file
src/backend/commands/subscriptioncmds.c.rej
patching file src/test/regress/expected/subscription.out
patching file src/test/regress/sql/subscription.sql
patching file src/test/subscription/t/030_origin.pl
patching file src/tools/pgindent/typedefs.list

--
[1] http://cfbot.cputube.org/patch_38_3610.log

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Handle infinite recursion in logical replication setup

2022-07-26 Thread Amit Kapila
On Tue, Jul 26, 2022 at 10:23 AM Peter Smith  wrote:
>
> On Tue, Jul 26, 2022 at 2:09 PM Amit Kapila  wrote:
> >
> >
> > I am not really sure how much we gain by maintaining consistency with
> > slot_name because if due to this we have to change the error messages
> > as well then it can create an inconsistency with reserved origin
> > names. Consider message: DETAIL:  Origin names "any", "none", and
> > names starting with "pg_" are reserved. Now, if we change this to
> > "ANY", "NONE" in the above message, it will look a bit odd as "pg_"
> > starts with lower case letters.
> >
>
> Sure, the message looks a bit odd with the quotes like you wrote
> above, but I would not suggest to change it that way - I was thinking
> more like below (which is similar to the style the slot_name messages
> use)
>
> CURRENT
> DETAIL:  Origin names "any", "none", and names starting with "pg_" are 
> reserved.
>
> SUGGESTED
> DETAIL:  Origin names ANY, NONE, and names starting with "pg_" are reserved.
>

I see your point but not sure if that is an improvement over the
current one, so, let's wait and see if we get some other votes in
favor of your suggestion.

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-07-26 Thread vignesh C
On Mon, Jul 25, 2022 at 12:58 PM Peter Smith  wrote:
>
> Firstly, I have some (case-sensitivity) questions about the previous
> patch which was already pushed [1].
>
> Q1. create_subscription docs
>
> I did not understand why the docs refer to slot_name = NONE, yet the
> newly added option says origin = none/any. I think that saying origin
> = NONE/ANY would be more consistent with the existing usage of NONE in
> this documentation.
>
> ~~~
>
> Q2. parse_subscription_options
>
> Similarly, in the code (parse_subscription_options), I did not
> understand why the checks for special name values are implemented
> differently:
>
> The new 'origin' code is using pg_strcmpcase to check special values
> (none/any), and the old 'slot_name' code uses case-sensitive strcmp to
> check the special value (none).
>
> FWIW, here I thought the new origin code is the correct one.
>
> ==
>
> Now, here are some review comments for the patch v38-0001:
>
> 1. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed
>
> @@ -1781,6 +1858,121 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
>   table_close(rel, RowExclusiveLock);
>  }
>
> +/*
> + * Check and throw an error if the publisher has subscribed to the same table
> + * from some other publisher. This check is required only if copydata is ON 
> and
> + * the origin is local for CREATE SUBSCRIPTION and
> + * ALTER SUBSCRIPTION ... REFRESH statements to avoid replicating remote data
> + * from the publisher.
> + *
> + * This check need not be performed on the tables that are already added as
> + * incremental sync for such tables will happen through WAL and the origin of
> + * the data can be identified from the WAL records.
> + *
> + * subrel_local_oids contains the list of relation oids that are already
> + * present on the subscriber.
> + */
> +static void
> +check_pub_table_subscribed(WalReceiverConn *wrconn, List *publications,
> +CopyData copydata, char *origin,
> +Oid *subrel_local_oids, int subrel_count)
>
> 1a.
> "copydata is ON" --> "copy_data = on" (because the comment is talking
> about the CREATE/ALTER statements, so it seemed a bit confusing to
> refer to the copydata function param instead of the copy_data
> subscription parameter)

Modified

> 1b.
> "the origin is local" ?? But, "local" was the old special name value.
> Now it is "none", so I think this part needs minor rewording.

Modified

>
> ~~~
>
> 2.
>
> + if (copydata != COPY_DATA_ON || !origin ||
> + (pg_strcasecmp(origin, "none") != 0))
> + return;
>
> Should this be using the constant LOGICALREP_ORIGIN_NONE?

Modified

> ~~~
>
> 3.
>
> + /*
> + * Throw an error if the publisher has subscribed to the same table
> + * from some other publisher. We cannot differentiate between the
> + * origin and non-origin data that is present in the HEAP during the
> + * initial sync. Identification of non-origin data can be done only
> + * from the WAL by using the origin id.
> + *
> + * XXX: For simplicity, we don't check whether the table has any data
> + * or not. If the table doesn't have any data then we don't need to
> + * distinguish between local and non-local data so we can avoid
> + * throwing an error in that case.
> + */
>
> 3a.
> When the special origin value changed from "local" to "none" this
> comment's first part seems to have got a bit lost in translation.

Modified

> SUGGESTION:
> Throw an error if the publisher has subscribed to the same table from
> some other publisher. We cannot know the origin of data during the
> initial sync. Data origins can be found only from the WAL by looking
> at the origin id.

Modified

> 3b.
> I think referring to "local and non-local" data in the XXX part of
> this comment also needs some minor rewording now that "local" is not a
> special origin name anymore.

Modified

Thanks for the comments, the v39 patch shared at [1] has the changes
for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm2POATc_jwQ-8MBJgGCVZGdUNhnTv8zkBuGzLaY03dM%3DA%40mail.gmail.com

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-07-26 Thread vignesh C
On Tue, Jul 26, 2022 at 2:12 PM wangw.f...@fujitsu.com
 wrote:
>
> On Sun, Jul 24, 2022 1:28 AM vignesh C  wrote:
> > Added a  note for the same and referred it to the conflicts section.
> >
> > Thanks for the comments, the attached v38 patch has the changes for the 
> > same.
>
> Thanks for your patches.
>
> Two slight comments on the below message in the 0001 patch:
>
> The error message in the function check_pub_table_subscribed().
> +   ereport(ERROR,
> +   
> errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +   errmsg("could not replicate table \"%s.%s\"",
> +  nspname, relname),
> +   errdetail("CREATE/ALTER SUBSCRIPTION with 
> origin = none and copy_data = on is not allowed when the publisher has 
> subscribed same table."),
> +   errhint("Use CREATE/ALTER SUBSCRIPTION with 
> copy_data = off/force."));
>
> 1.
> I think it might be better to use true/false here than on/off.
> Just for consistency with another error message
> (in function parse_subscription_options) and the description of this parameter
> in the PG document.

Modified

> If you agree with this, please also kindly consider the attached
> "slight_modification.diff" file.
> (This is a slight modification for the second patch, just replace "off" with
> "false" in PG document.)

I have updated the documentation similarly

> 2.
> How about replacing "origin = none" and "copy_data = on" in the message with
> "%s"? I think this might be better for translation. Just like the following
> error message in the function parse_subscription_options:
> ```
> if (opts->copy_data &&
> IsSet(opts->specified_opts, SUBOPT_COPY_DATA))
> ereport(ERROR,
> (errcode(ERRCODE_SYNTAX_ERROR),
>  errmsg("%s and %s are mutually 
> exclusive options",
> "connect = false", 
> "copy_data = true/force")));
> ```

Modified

Thanks for the comments, the v39 patch shared at [1] has the changes
for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm2POATc_jwQ-8MBJgGCVZGdUNhnTv8zkBuGzLaY03dM%3DA%40mail.gmail.com

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-07-26 Thread vignesh C
On Tue, Jul 26, 2022 at 1:35 PM shiy.f...@fujitsu.com
 wrote:
>
> On Sun, Jul 24, 2022 1:28 AM vignesh C  wrote:
> >
> > Added a  note for the same and referred it to the conflicts section.
> >
> > Thanks for the comments, the attached v38 patch has the changes for the
> > same.
> >
>
> Thanks for updating the patch. A comment on the test in 0001 patch.
>
> +# Alter subscription ... refresh publication should fail when a new table is
> +# subscribing data from a different publication should fail
> +($result, $stdout, $stderr) = $node_A->psql(
> +   'postgres', "
> +ALTER SUBSCRIPTION tap_sub_A2 REFRESH PUBLICATION");
> +like(
> +   $stderr,
> +   qr/ERROR: ( [A-Z0-9]+:)? could not replicate table "public.tab_new"/,
> +   "Create subscription with origin and copy_data having replicated 
> table in publisher"
> +);
>
> The comment says "should fail" twice, the latter one can be removed.

Modified

> Besides, "Create subscription with origin and copy_data" should be changed to
> "Alter subscription with origin and copy_data" I think.

Modified to "Refresh publication"

Thanks for the comments, the v39 patch shared at [1] has the changes
for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm2POATc_jwQ-8MBJgGCVZGdUNhnTv8zkBuGzLaY03dM%3DA%40mail.gmail.com

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-07-26 Thread vignesh C
On Tue, Jul 26, 2022 at 7:16 AM Peter Smith  wrote:
>
> Here are some review comments for the patch v38-0002:
>
> ==
>
>  - terminology
>
> There seemed to be an inconsistent alternation of the terms
> "primaries" and "nodes"... For example "Setting replication between
> two primaries" versus "Adding a new node..." (instead of "Adding a new
> primary..."?). I have included suggested minor rewording changes in
> the review comments below, but please check in case I miss something.
> Because I suggested changes to some titles maybe you will also want to
> change the section ids too.
>
> ~~~
>
> 1. Commit message
>
> The documentation was recently modified to remove the term
> "bidirectional replication" and replace it all with "replication
> between primaries", so this commit message (and also the patch name
> itself) should be similarly modified.

Modified

> ~~~
>
> 2.
> +   
> +Replication between primaries is useful for creating a multi-master
> +database environment for replicating write operations performed by any of
> +the member nodes. The steps to create replication between primaries in
> +various scenarios are given below. Note: User is responsible for 
> designing
> +their schemas in a way to minimize the risk of conflicts. See
> + for the details of 
> logical
> +replication conflicts. The logical replication restrictions applies to
> +the replication between primaries also. See
> + for the details of
> +logical replication restrictions.
> +   
>
> 2a.
> "User" -> "The user"

Modified

> 2b.
> "The logical replication restrictions applies to..." --> "The logical
> replication restrictions apply to..."

Modified

> 2c.
> These are important notes. Instead of just being part of the text
> blurb, perhaps these should be rendered as SGML  (or put them
> both in a single  if you want)

Modified

> ~~~
>
> 3. Setting replication between two primaries
>
> +   Setting replication between two primaries
> +   
> +The following steps demonstrate how to setup replication between two
> +primaries when there is no table data present on both nodes
> +node1 and node2:
> +   
>
> SUGGESTED
> The following steps demonstrate how to set up replication between two
> primaries (node1 and node2) when there is no table data present on
> both nodes:.

Modified

> ~~~
>
> 4.
> +   
> +Now the replication setup between two primaries node1
> +and node2 is complete. Any incremental changes from
> +node1 will be replicated to node2,
> +and any incremental changes from node2 will be
> +replicated to node1.
> +   
>
> "between two primaries" -> "between primaries"

Modified

> ~~~
>
> 5. Adding a new node when there is no table data on any of the nodes
>
> SUGGESTION (title)
> Adding a new primary when there is no table data on any of the primaries

Modified

> ~~~
>
> 6.
> +   
> +The following steps demonstrate adding a new node 
> node3
> +to the existing node1 and node2 
> when
> +there is no t1 data on any of the nodes. This requires
>
> SUGGESTION
> The following steps demonstrate adding a new primary (node3) to the
> existing primaries (node1 and node2) when there is no t1 data on any
> of the nodes.

Modified

> ~~~
>
> 7. Adding a new node when table data is present on the existing nodes
>
> SUGGESTION (title)
> Adding a new primary when table data is present on the existing primaries

Modified

> ~~~
>
> 8.
> +
> + The following steps demonstrate adding a new node 
> node3
> + which has no t1 data to the existing
> + node1 and node2 where
> + t1 data is present. This needs similar steps; the 
> only
>
> SUGGESTION
> The following steps demonstrate adding a new primary (node3) that has
> no t1 data to the existing primaries (node1 and node2) where t1 data
> is present.

Modified

> ~~~
>
> 9. Adding a new node when table data is present on the new node
>
> SUGGESTION (title)
> Adding a new primary that has existing table data

Modified

> ~~~
>
> 10.
> +   
> +
> + Adding a new node when table data is present on the new node is not
> + supported.
> +
> +   
>
> SUGGESTION
> Adding a new primary that has existing table data is not supported.

Modified

> ~~~
>
> 11. Generic steps for adding a new node to an existing set of primaries
>
> SUGGESTION (title)
> Generic steps for adding a new primary to an existing set of primaries

Modified

Thanks for the comments, the v39 patch shared at [1] has the changes
for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm2POATc_jwQ-8MBJgGCVZGdUNhnTv8zkBuGzLaY03dM%3DA%40mail.gmail.com

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-07-26 Thread vignesh C
On Sun, Jul 24, 2022 at 10:21 PM Jonathan S. Katz  wrote:
>
> On 7/22/22 12:47 AM, Amit Kapila wrote:
> > On Fri, Jul 22, 2022 at 1:39 AM Jonathan S. Katz  
> > wrote:
>
> >> 1. I'm concerned by calling this "Bidirectional replication" in the docs
> >> that we are overstating the current capabilities. I think this is
> >> accentuated int he opening paragraph:
> >>
> >> ==snip==
> >>Bidirectional replication is useful for creating a multi-master database
> >>environment for replicating read/write operations performed by any of 
> >> the
> >>member nodes.
> >> ==snip==
> >>
> >> For one, we're not replicating reads, we're replicating writes. Amongst
> >> the writes, at this point we're only replicating DML. A reader could
> >> think that deploying can work for a full bidirectional solution.
> >>
> >> (Even if we're aspirationally calling this section "Bidirectional
> >> replication", that does make it sound like we're limited to two nodes,
> >> when we can support more than two).
> >>
> >
> > Right, I think the system can support N-Way replication.
>
> I did some more testing of the feature, i.e. doing 3-node and 4-node
> tests. While logical replication today can handle replicating between
> multiple nodes (N-way), the "origin = none" does require setting up
> subscribers between each of the nodes.
>
> For example, if I have 4 nodes A, B, C, D and I want to replicate the
> same table between all of them, I need to set up subscribers between all
> of them (A<=>B, A<=>C, A<=>D, B<=>C, B<=>D, C<=>D). However, each node
> can replicate between each other in a way that's convenient (vs. having
> to do something funky with partitions) so this is still a big step forward.
>
> This is a long way of saying that I do think it's fair to say we support
> "N-way" replication so long as you are set up in a mesh (vs. a ring,
> e.g. A=>B=>C=>D=>A).
>
> > Among the above "Replicating changes between primaries" sounds good to
> > me or simply "Replication between primaries". As this is a sub-section
> > on the Logical Replication page, I feel it is okay to not use Logical
> > in the title.
>
> Agreed, I think that's fine.
>
> >> At a minimum, I think we should reference the documentation we have in
> >> the logical replication section on conflicts. We may also want to advise
> >> that a user is responsible for designing their schemas in a way to
> >> minimize the risk of conflicts.
> >>
> >
> > This sounds reasonable to me.
> >
> > One more point about docs, it appears to be added as the last
> > sub-section on the Logical Replication page. Is there a reason for
> > doing so? I feel this should be third sub-section after describing
> > Publication and Subscription.
>
> When I first reviewed, I had not built the docs. Did so on this pass.
>
> I agree with the positioning argument, i.e. it should go after
> "Subscription" in the table of contents -- but it makes me question a
> couple of things:
>
> 1. The general ordering of the docs
> 2. How we describe that section (more on that in a sec)
> 3. If "row filters" should be part of "subscription" instead of its own
> section.
>
> If you look at the current order, "Quick setup" is the last section; one
> would think the "quick" portion goes first :) Given a lot of this is for
> the current docs, I may start a separate discussion on -docs for this part.
>
> For the time being, I agree it should be moved to the section after
> "Subscription".
>
> I think what this section describes is "Configuring Replication Between
> Nodes" as it covers a few different scenarios.
>
> I do think we need to iterate on these docs -- the examples with the
> commands are generally OK and easy to follow, but a few things I noticed:
>
> 1. The general description of the section needs work. We may want to
> refine the description of the use cases, and in the warning, link to
> instructions on how to take backups.

Modified

> 2. We put the "case not supported" in the middle, not at the end.

Modified

> 3. The "generic steps for adding a new node..." section uses a
> convention for steps that is not found in the docs. We also don't
> provide an example for this section, and this is the most complicated
> scenario to set up.

Modified

Thanks a lot for the suggestions, I have made the changes  for the
same in the v39 patch attached.

Regards,
Vignesh
From 29c2206e2440bee0ee9818b6a2af66dad5a5d5d4 Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Tue, 26 Jul 2022 22:59:15 +0530
Subject: [PATCH v39 1/2] Check and throw an error if publication tables were
 also subscribing from other publishers and support force value for copy_data
 parameter.

This patch does a couple of things:
1) Checks and throws an error if 'copy_data = on' and 'origin =
none' but the publication tables were also replicated from other publishers.
2) Adds 'force' value for copy_data parameter.

---
The steps below help to demonstrate how the new ex

RE: Handle infinite recursion in logical replication setup

2022-07-26 Thread wangw.f...@fujitsu.com
On Sun, Jul 24, 2022 1:28 AM vignesh C  wrote:
> Added a  note for the same and referred it to the conflicts section.
> 
> Thanks for the comments, the attached v38 patch has the changes for the same.

Thanks for your patches.

Two slight comments on the below message in the 0001 patch:

The error message in the function check_pub_table_subscribed().
+   ereport(ERROR,
+   
errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+   errmsg("could not replicate table \"%s.%s\"",
+  nspname, relname),
+   errdetail("CREATE/ALTER SUBSCRIPTION with 
origin = none and copy_data = on is not allowed when the publisher has 
subscribed same table."),
+   errhint("Use CREATE/ALTER SUBSCRIPTION with 
copy_data = off/force."));

1.
I think it might be better to use true/false here than on/off.
Just for consistency with another error message
(in function parse_subscription_options) and the description of this parameter
in the PG document.

If you agree with this, please also kindly consider the attached
"slight_modification.diff" file.
(This is a slight modification for the second patch, just replace "off" with
"false" in PG document.)

2.
How about replacing "origin = none" and "copy_data = on" in the message with
"%s"? I think this might be better for translation. Just like the following
error message in the function parse_subscription_options:
```
if (opts->copy_data &&
IsSet(opts->specified_opts, SUBOPT_COPY_DATA))
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
 errmsg("%s and %s are mutually 
exclusive options",
"connect = false", 
"copy_data = true/force")));
```

Regards,
Wang wei


slight_modification.diff
Description: slight_modification.diff


RE: Handle infinite recursion in logical replication setup

2022-07-26 Thread shiy.f...@fujitsu.com
On Sun, Jul 24, 2022 1:28 AM vignesh C  wrote:
> 
> Added a  note for the same and referred it to the conflicts section.
> 
> Thanks for the comments, the attached v38 patch has the changes for the
> same.
> 

Thanks for updating the patch. A comment on the test in 0001 patch.

+# Alter subscription ... refresh publication should fail when a new table is
+# subscribing data from a different publication should fail
+($result, $stdout, $stderr) = $node_A->psql(
+   'postgres', "
+ALTER SUBSCRIPTION tap_sub_A2 REFRESH PUBLICATION");
+like(
+   $stderr,
+   qr/ERROR: ( [A-Z0-9]+:)? could not replicate table "public.tab_new"/,
+   "Create subscription with origin and copy_data having replicated table 
in publisher"
+);

The comment says "should fail" twice, the latter one can be removed.

Besides, "Create subscription with origin and copy_data" should be changed to
"Alter subscription with origin and copy_data" I think.

Regards,
Shi yu


Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Peter Smith
On Tue, Jul 26, 2022 at 2:09 PM Amit Kapila  wrote:
>
> On Tue, Jul 26, 2022 at 5:04 AM Peter Smith  wrote:
> >
> > On Mon, Jul 25, 2022 at 7:33 PM vignesh C  wrote:
> > >
> > > On Mon, Jul 25, 2022 at 12:58 PM Peter Smith  
> > > wrote:
> > > >
> > > > Firstly, I have some (case-sensitivity) questions about the previous
> > > > patch which was already pushed [1].
> > > >
> > > > Q1. create_subscription docs
> > > >
> > > > I did not understand why the docs refer to slot_name = NONE, yet the
> > > > newly added option says origin = none/any. I think that saying origin
> > > > = NONE/ANY would be more consistent with the existing usage of NONE in
> > > > this documentation.
> > >
> > > Both NONE and none are ok in the case of origin, if you want I can
> > > change it to NONE/ANY in case of origin to keep it consistent.
> > >
> >
> > I preferred the special origin values should be documented as NONE/ANY
> > for better consistency, but let's see what others think about it.
> >
> > There will also be associated minor changes needed for a few
> > error/hint messages.
> >
>
> I am not really sure how much we gain by maintaining consistency with
> slot_name because if due to this we have to change the error messages
> as well then it can create an inconsistency with reserved origin
> names. Consider message: DETAIL:  Origin names "any", "none", and
> names starting with "pg_" are reserved. Now, if we change this to
> "ANY", "NONE" in the above message, it will look a bit odd as "pg_"
> starts with lower case letters.
>

Sure, the message looks a bit odd with the quotes like you wrote
above, but I would not suggest to change it that way - I was thinking
more like below (which is similar to the style the slot_name messages
use)

CURRENT
DETAIL:  Origin names "any", "none", and names starting with "pg_" are reserved.

SUGGESTED
DETAIL:  Origin names ANY, NONE, and names starting with "pg_" are reserved.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Amit Kapila
On Tue, Jul 26, 2022 at 5:04 AM Peter Smith  wrote:
>
> On Mon, Jul 25, 2022 at 7:33 PM vignesh C  wrote:
> >
> > On Mon, Jul 25, 2022 at 12:58 PM Peter Smith  wrote:
> > >
> > > Firstly, I have some (case-sensitivity) questions about the previous
> > > patch which was already pushed [1].
> > >
> > > Q1. create_subscription docs
> > >
> > > I did not understand why the docs refer to slot_name = NONE, yet the
> > > newly added option says origin = none/any. I think that saying origin
> > > = NONE/ANY would be more consistent with the existing usage of NONE in
> > > this documentation.
> >
> > Both NONE and none are ok in the case of origin, if you want I can
> > change it to NONE/ANY in case of origin to keep it consistent.
> >
>
> I preferred the special origin values should be documented as NONE/ANY
> for better consistency, but let's see what others think about it.
>
> There will also be associated minor changes needed for a few
> error/hint messages.
>

I am not really sure how much we gain by maintaining consistency with
slot_name because if due to this we have to change the error messages
as well then it can create an inconsistency with reserved origin
names. Consider message: DETAIL:  Origin names "any", "none", and
names starting with "pg_" are reserved. Now, if we change this to
"ANY", "NONE" in the above message, it will look a bit odd as "pg_"
starts with lower case letters.

--
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Amit Kapila
On Tue, Jul 26, 2022 at 7:48 AM Peter Smith  wrote:
>
> On Tue, Jul 26, 2022 at 11:43 AM Jonathan S. Katz  
> wrote:
> >
> ...
> > That said, this introduces a new restriction for this particular
> > scenario that doesn't exist on other scenarios. Instead, I would
> > advocate we document how to correctly set up the two-way replication
> > scenario (which we have a draft!), document the warnings around the
> > conflicts, perhaps include Vignesh's instructions on how to remediate a
> > conflict on initial sync, and consider throwing a WARNING as you suggested.
> >
> > Thoughts?
>
> Perhaps a WARNING can be useful if the SUBSCRIPTION was created with
> enabled=false because then the user still has a chance to reconsider,
>

Agreed. I think but in that case, when the user enables it, we need to
ensure that we won't allow replicating (during initial sync) remote
data. If this is really required/preferred, it can be done as a
separate enhancement.

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Amit Kapila
On Tue, Jul 26, 2022 at 7:13 AM Jonathan S. Katz  wrote:
>
> On 7/25/22 4:54 AM, vignesh C wrote:
> >
> > Let's take a simple case to understand why copy_data = force is
> > required to replicate between two primaries for table t1 which has
> > data as given below:
>
> > step4 - Node-2:
> > Create Subscription sub2 Connection '' Publication
> > pub1_2 with (origin = none, copy_data=on);
> > If we had allowed the create subscription to be successful with
> > copy_data = on. After this the data will be something like this:
> > Node-1:
> > 1, 2, 3, 4, 5, 6, 7, 8
> >
> > Node-2:
> > 1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8
> >
> > So, you can see that data on Node-2 (5, 6, 7, 8) is duplicated. In
> > case, table t1 has a unique key, it will lead to a unique key
> > violation and replication won't proceed.
> >
> > To avoid this we will throw an error:
> > ERROR:  could not replicate table "public.t1"
> > DETAIL:  CREATE/ALTER SUBSCRIPTION with origin = none and copy_data =
> > on is not allowed when the publisher has subscribed same table.
> > HINT:  Use CREATE/ALTER SUBSCRIPTION with copy_data = off/force.
>
> Thanks for the example. I agree that it is fairly simple to reproduce.
>
> I understand that "copy_data = force" is meant to protect a user from
> hurting themself. I'm not convinced that this is the best way to do so.
>
> For example today I can subscribe to multiple publications that write to
> the same table. If I have a primary key on that table, and two of the
> subscriptions try to write an identical ID, we conflict. We don't have
> any special flags or modes to guard against that from happening, though
> we do have documentation on conflicts and managing them.
>
> AFAICT the same issue with "copy_data" also exists in the above scenario
> too, even without the "origin" attribute.
>

That's true but there is no parameter like origin = NONE which
indicates that constraint violations or duplicate data problems won't
occur due to replication. In the current case, I think the situation
is different because a user has specifically asked not to replicate
any remote data by specifying origin = NONE, which should be dealt
differently. Note that current users or their setup won't see any
difference/change unless they specify the new parameter origin as
NONE.

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Peter Smith
On Tue, Jul 26, 2022 at 11:43 AM Jonathan S. Katz  wrote:
>
...
> That said, this introduces a new restriction for this particular
> scenario that doesn't exist on other scenarios. Instead, I would
> advocate we document how to correctly set up the two-way replication
> scenario (which we have a draft!), document the warnings around the
> conflicts, perhaps include Vignesh's instructions on how to remediate a
> conflict on initial sync, and consider throwing a WARNING as you suggested.
>
> Thoughts?

Perhaps a WARNING can be useful if the SUBSCRIPTION was created with
enabled=false because then the user still has a chance to reconsider,
but otherwise, I don't see what good a warning does if the potentially
harmful initial copy is going to proceed anyway; isn't that like
putting a warning sign at the bottom of a cliff?

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Peter Smith
Here are some review comments for the patch v38-0002:

==

 - terminology

There seemed to be an inconsistent alternation of the terms
"primaries" and "nodes"... For example "Setting replication between
two primaries" versus "Adding a new node..." (instead of "Adding a new
primary..."?). I have included suggested minor rewording changes in
the review comments below, but please check in case I miss something.
Because I suggested changes to some titles maybe you will also want to
change the section ids too.

~~~

1. Commit message

The documentation was recently modified to remove the term
"bidirectional replication" and replace it all with "replication
between primaries", so this commit message (and also the patch name
itself) should be similarly modified.

~~~

2.
+   
+Replication between primaries is useful for creating a multi-master
+database environment for replicating write operations performed by any of
+the member nodes. The steps to create replication between primaries in
+various scenarios are given below. Note: User is responsible for designing
+their schemas in a way to minimize the risk of conflicts. See
+ for the details of logical
+replication conflicts. The logical replication restrictions applies to
+the replication between primaries also. See
+ for the details of
+logical replication restrictions.
+   

2a.
"User" -> "The user"

2b.
"The logical replication restrictions applies to..." --> "The logical
replication restrictions apply to..."

2c.
These are important notes. Instead of just being part of the text
blurb, perhaps these should be rendered as SGML  (or put them
both in a single  if you want)

~~~

3. Setting replication between two primaries

+   Setting replication between two primaries
+   
+The following steps demonstrate how to setup replication between two
+primaries when there is no table data present on both nodes
+node1 and node2:
+   

SUGGESTED
The following steps demonstrate how to set up replication between two
primaries (node1 and node2) when there is no table data present on
both nodes:

~~~

4.
+   
+Now the replication setup between two primaries node1
+and node2 is complete. Any incremental changes from
+node1 will be replicated to node2,
+and any incremental changes from node2 will be
+replicated to node1.
+   

"between two primaries" -> "between primaries"

~~~

5. Adding a new node when there is no table data on any of the nodes

SUGGESTION (title)
Adding a new primary when there is no table data on any of the primaries

~~~

6.
+   
+The following steps demonstrate adding a new node node3
+to the existing node1 and node2 when
+there is no t1 data on any of the nodes. This requires

SUGGESTION
The following steps demonstrate adding a new primary (node3) to the
existing primaries (node1 and node2) when there is no t1 data on any
of the nodes.

~~~

7. Adding a new node when table data is present on the existing nodes

SUGGESTION (title)
Adding a new primary when table data is present on the existing primaries

~~~

8.
+
+ The following steps demonstrate adding a new node node3
+ which has no t1 data to the existing
+ node1 and node2 where
+ t1 data is present. This needs similar steps; the only

SUGGESTION
The following steps demonstrate adding a new primary (node3) that has
no t1 data to the existing primaries (node1 and node2) where t1 data
is present.

~~~

9. Adding a new node when table data is present on the new node

SUGGESTION (title)
Adding a new primary that has existing table data

~~~

10.
+   
+
+ Adding a new node when table data is present on the new node is not
+ supported.
+
+   

SUGGESTION
Adding a new primary that has existing table data is not supported.

~~~

11. Generic steps for adding a new node to an existing set of primaries

SUGGESTION (title)
Generic steps for adding a new primary to an existing set of primaries

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Jonathan S. Katz

On 7/25/22 4:54 AM, vignesh C wrote:

On Sun, Jul 24, 2022 at 10:21 PM Jonathan S. Katz  wrote:


On 7/22/22 12:47 AM, Amit Kapila wrote:

On Fri, Jul 22, 2022 at 1:39 AM Jonathan S. Katz  wrote:



BTW, do you have any opinion on the idea of the first remaining patch
where we accomplish two things: a) Checks and throws an error if
'copy_data = on' and 'origin = none' but the publication tables were
also replicated from other publishers. b) Adds 'force' value for
copy_data parameter to allow copying in such a case. The primary
reason for this patch is to avoid loops or duplicate data in the
initial phase. We can't skip copying based on origin as we can do
while replicating changes from WAL. So, we detect that the publisher
already has data from some other node and doesn't allow replication
unless the user uses the 'force' option for copy_data.


In general, I agree with the patch; but I'm not sure why we are calling
"copy_data = force" in this case and how it varies from "on". I
understand the goal is to prevent the infinite loop, but is there some
technical restriction why we can't set "origin = none, copy_data = on"
and have this work (and apologies if I missed that upthread)?


Let's take a simple case to understand why copy_data = force is
required to replicate between two primaries for table t1 which has
data as given below:



step4 - Node-2:
Create Subscription sub2 Connection '' Publication
pub1_2 with (origin = none, copy_data=on);
If we had allowed the create subscription to be successful with
copy_data = on. After this the data will be something like this:
Node-1:
1, 2, 3, 4, 5, 6, 7, 8

Node-2:
1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8

So, you can see that data on Node-2 (5, 6, 7, 8) is duplicated. In
case, table t1 has a unique key, it will lead to a unique key
violation and replication won't proceed.

To avoid this we will throw an error:
ERROR:  could not replicate table "public.t1"
DETAIL:  CREATE/ALTER SUBSCRIPTION with origin = none and copy_data =
on is not allowed when the publisher has subscribed same table.
HINT:  Use CREATE/ALTER SUBSCRIPTION with copy_data = off/force.


Thanks for the example. I agree that it is fairly simple to reproduce.

I understand that "copy_data = force" is meant to protect a user from 
hurting themself. I'm not convinced that this is the best way to do so.


For example today I can subscribe to multiple publications that write to 
the same table. If I have a primary key on that table, and two of the 
subscriptions try to write an identical ID, we conflict. We don't have 
any special flags or modes to guard against that from happening, though 
we do have documentation on conflicts and managing them.


AFAICT the same issue with "copy_data" also exists in the above scenario 
too, even without the "origin" attribute. However, I think this case is 
more noticeable for "origin=none" because we currently default 
"copy_data" to "true" and in this case data can be copied in two 
directions.


That said, this introduces a new restriction for this particular 
scenario that doesn't exist on other scenarios. Instead, I would 
advocate we document how to correctly set up the two-way replication 
scenario (which we have a draft!), document the warnings around the 
conflicts, perhaps include Vignesh's instructions on how to remediate a 
conflict on initial sync, and consider throwing a WARNING as you suggested.


Thoughts?

Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Peter Smith
On Mon, Jul 25, 2022 at 7:33 PM vignesh C  wrote:
>
> On Mon, Jul 25, 2022 at 12:58 PM Peter Smith  wrote:
> >
> > Firstly, I have some (case-sensitivity) questions about the previous
> > patch which was already pushed [1].
> >
> > Q1. create_subscription docs
> >
> > I did not understand why the docs refer to slot_name = NONE, yet the
> > newly added option says origin = none/any. I think that saying origin
> > = NONE/ANY would be more consistent with the existing usage of NONE in
> > this documentation.
>
> Both NONE and none are ok in the case of origin, if you want I can
> change it to NONE/ANY in case of origin to keep it consistent.
>

I preferred the special origin values should be documented as NONE/ANY
for better consistency, but let's see what others think about it.

There will also be associated minor changes needed for a few
error/hint messages.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Peter Smith
On Mon, Jul 25, 2022 at 6:54 PM Amit Kapila  wrote:
>
> On Mon, Jul 25, 2022 at 12:58 PM Peter Smith  wrote:
> >
...
> >
> > Q2. parse_subscription_options
> >
> > Similarly, in the code (parse_subscription_options), I did not
> > understand why the checks for special name values are implemented
> > differently:
> >
> > The new 'origin' code is using pg_strcmpcase to check special values
> > (none/any), and the old 'slot_name' code uses case-sensitive strcmp to
> > check the special value (none).
> >
>
> We have a restriction for slot_names for lower case letters (aka
> "Replication slot names may only contain lower case letters, numbers,
> and the underscore character.") whereas there is no such restriction
> in origin name, that's why the check is different. So, if you try with
> slot_name = 'NONE', you will get the error.
>

2022-07-26 09:06:06.380 AEST [3630] STATEMENT:  create subscription
mysub connection 'host=localhost' publication mypub with
(slot_name='None', enabled=false, create_slot=false);
ERROR:  replication slot name "None" contains invalid character
HINT:  Replication slot names may only contain lower case letters,
numbers, and the underscore character.

You are right. Thanks for the explanation.

(Aside: Probably that error message wording ought to say "contains
invalid characters" instead of "contains invalid character")

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread vignesh C
On Mon, Jul 25, 2022 at 2:24 PM vignesh C  wrote:
>
> On Sun, Jul 24, 2022 at 10:21 PM Jonathan S. Katz  
> wrote:
> >
> > On 7/22/22 12:47 AM, Amit Kapila wrote:
> > > On Fri, Jul 22, 2022 at 1:39 AM Jonathan S. Katz  
> > > wrote:
> >
> > >> 1. I'm concerned by calling this "Bidirectional replication" in the docs
> > >> that we are overstating the current capabilities. I think this is
> > >> accentuated int he opening paragraph:
> > >>
> > >> ==snip==
> > >>Bidirectional replication is useful for creating a multi-master 
> > >> database
> > >>environment for replicating read/write operations performed by any of 
> > >> the
> > >>member nodes.
> > >> ==snip==
> > >>
> > >> For one, we're not replicating reads, we're replicating writes. Amongst
> > >> the writes, at this point we're only replicating DML. A reader could
> > >> think that deploying can work for a full bidirectional solution.
> > >>
> > >> (Even if we're aspirationally calling this section "Bidirectional
> > >> replication", that does make it sound like we're limited to two nodes,
> > >> when we can support more than two).
> > >>
> > >
> > > Right, I think the system can support N-Way replication.
> >
> > I did some more testing of the feature, i.e. doing 3-node and 4-node
> > tests. While logical replication today can handle replicating between
> > multiple nodes (N-way), the "origin = none" does require setting up
> > subscribers between each of the nodes.
> >
> > For example, if I have 4 nodes A, B, C, D and I want to replicate the
> > same table between all of them, I need to set up subscribers between all
> > of them (A<=>B, A<=>C, A<=>D, B<=>C, B<=>D, C<=>D). However, each node
> > can replicate between each other in a way that's convenient (vs. having
> > to do something funky with partitions) so this is still a big step forward.
> >
> > This is a long way of saying that I do think it's fair to say we support
> > "N-way" replication so long as you are set up in a mesh (vs. a ring,
> > e.g. A=>B=>C=>D=>A).
> >
> > > Among the above "Replicating changes between primaries" sounds good to
> > > me or simply "Replication between primaries". As this is a sub-section
> > > on the Logical Replication page, I feel it is okay to not use Logical
> > > in the title.
> >
> > Agreed, I think that's fine.
> >
> > >> At a minimum, I think we should reference the documentation we have in
> > >> the logical replication section on conflicts. We may also want to advise
> > >> that a user is responsible for designing their schemas in a way to
> > >> minimize the risk of conflicts.
> > >>
> > >
> > > This sounds reasonable to me.
> > >
> > > One more point about docs, it appears to be added as the last
> > > sub-section on the Logical Replication page. Is there a reason for
> > > doing so? I feel this should be third sub-section after describing
> > > Publication and Subscription.
> >
> > When I first reviewed, I had not built the docs. Did so on this pass.
> >
> > I agree with the positioning argument, i.e. it should go after
> > "Subscription" in the table of contents -- but it makes me question a
> > couple of things:
> >
> > 1. The general ordering of the docs
> > 2. How we describe that section (more on that in a sec)
> > 3. If "row filters" should be part of "subscription" instead of its own
> > section.
> >
> > If you look at the current order, "Quick setup" is the last section; one
> > would think the "quick" portion goes first :) Given a lot of this is for
> > the current docs, I may start a separate discussion on -docs for this part.
> >
> > For the time being, I agree it should be moved to the section after
> > "Subscription".
> >
> > I think what this section describes is "Configuring Replication Between
> > Nodes" as it covers a few different scenarios.
> >
> > I do think we need to iterate on these docs -- the examples with the
> > commands are generally OK and easy to follow, but a few things I noticed:
> >
> > 1. The general description of the section needs work. We may want to
> > refine the description of the use cases, and in the warning, link to
> > instructions on how to take backups.
> > 2. We put the "case not supported" in the middle, not at the end.
> > 3. The "generic steps for adding a new node..." section uses a
> > convention for steps that is not found in the docs. We also don't
> > provide an example for this section, and this is the most complicated
> > scenario to set up.
> >
> > I may be able to propose some suggestions in a few days.
> >
> > > BTW, do you have any opinion on the idea of the first remaining patch
> > > where we accomplish two things: a) Checks and throws an error if
> > > 'copy_data = on' and 'origin = none' but the publication tables were
> > > also replicated from other publishers. b) Adds 'force' value for
> > > copy_data parameter to allow copying in such a case. The primary
> > > reason for this patch is to avoid loops or duplicate data in the
> > > initial phase. We can't skip copying base

Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread vignesh C
On Mon, Jul 25, 2022 at 12:58 PM Peter Smith  wrote:
>
> Firstly, I have some (case-sensitivity) questions about the previous
> patch which was already pushed [1].
>
> Q1. create_subscription docs
>
> I did not understand why the docs refer to slot_name = NONE, yet the
> newly added option says origin = none/any. I think that saying origin
> = NONE/ANY would be more consistent with the existing usage of NONE in
> this documentation.

Both NONE and none are ok in the case of origin, if you want I can
change it to NONE/ANY in case of origin to keep it consistent.

Regards,
Vignesh




Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Amit Kapila
On Mon, Jul 25, 2022 at 12:58 PM Peter Smith  wrote:
>
> Firstly, I have some (case-sensitivity) questions about the previous
> patch which was already pushed [1].
>
> Q1. create_subscription docs
>
> I did not understand why the docs refer to slot_name = NONE, yet the
> newly added option says origin = none/any. I think that saying origin
> = NONE/ANY would be more consistent with the existing usage of NONE in
> this documentation.
>
> ~~~
>
> Q2. parse_subscription_options
>
> Similarly, in the code (parse_subscription_options), I did not
> understand why the checks for special name values are implemented
> differently:
>
> The new 'origin' code is using pg_strcmpcase to check special values
> (none/any), and the old 'slot_name' code uses case-sensitive strcmp to
> check the special value (none).
>

We have a restriction for slot_names for lower case letters (aka
"Replication slot names may only contain lower case letters, numbers,
and the underscore character.") whereas there is no such restriction
in origin name, that's why the check is different. So, if you try with
slot_name = 'NONE', you will get the error.

-- 
With Regards,
Amit Kapila.




Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread vignesh C
On Sun, Jul 24, 2022 at 10:21 PM Jonathan S. Katz  wrote:
>
> On 7/22/22 12:47 AM, Amit Kapila wrote:
> > On Fri, Jul 22, 2022 at 1:39 AM Jonathan S. Katz  
> > wrote:
>
> >> 1. I'm concerned by calling this "Bidirectional replication" in the docs
> >> that we are overstating the current capabilities. I think this is
> >> accentuated int he opening paragraph:
> >>
> >> ==snip==
> >>Bidirectional replication is useful for creating a multi-master database
> >>environment for replicating read/write operations performed by any of 
> >> the
> >>member nodes.
> >> ==snip==
> >>
> >> For one, we're not replicating reads, we're replicating writes. Amongst
> >> the writes, at this point we're only replicating DML. A reader could
> >> think that deploying can work for a full bidirectional solution.
> >>
> >> (Even if we're aspirationally calling this section "Bidirectional
> >> replication", that does make it sound like we're limited to two nodes,
> >> when we can support more than two).
> >>
> >
> > Right, I think the system can support N-Way replication.
>
> I did some more testing of the feature, i.e. doing 3-node and 4-node
> tests. While logical replication today can handle replicating between
> multiple nodes (N-way), the "origin = none" does require setting up
> subscribers between each of the nodes.
>
> For example, if I have 4 nodes A, B, C, D and I want to replicate the
> same table between all of them, I need to set up subscribers between all
> of them (A<=>B, A<=>C, A<=>D, B<=>C, B<=>D, C<=>D). However, each node
> can replicate between each other in a way that's convenient (vs. having
> to do something funky with partitions) so this is still a big step forward.
>
> This is a long way of saying that I do think it's fair to say we support
> "N-way" replication so long as you are set up in a mesh (vs. a ring,
> e.g. A=>B=>C=>D=>A).
>
> > Among the above "Replicating changes between primaries" sounds good to
> > me or simply "Replication between primaries". As this is a sub-section
> > on the Logical Replication page, I feel it is okay to not use Logical
> > in the title.
>
> Agreed, I think that's fine.
>
> >> At a minimum, I think we should reference the documentation we have in
> >> the logical replication section on conflicts. We may also want to advise
> >> that a user is responsible for designing their schemas in a way to
> >> minimize the risk of conflicts.
> >>
> >
> > This sounds reasonable to me.
> >
> > One more point about docs, it appears to be added as the last
> > sub-section on the Logical Replication page. Is there a reason for
> > doing so? I feel this should be third sub-section after describing
> > Publication and Subscription.
>
> When I first reviewed, I had not built the docs. Did so on this pass.
>
> I agree with the positioning argument, i.e. it should go after
> "Subscription" in the table of contents -- but it makes me question a
> couple of things:
>
> 1. The general ordering of the docs
> 2. How we describe that section (more on that in a sec)
> 3. If "row filters" should be part of "subscription" instead of its own
> section.
>
> If you look at the current order, "Quick setup" is the last section; one
> would think the "quick" portion goes first :) Given a lot of this is for
> the current docs, I may start a separate discussion on -docs for this part.
>
> For the time being, I agree it should be moved to the section after
> "Subscription".
>
> I think what this section describes is "Configuring Replication Between
> Nodes" as it covers a few different scenarios.
>
> I do think we need to iterate on these docs -- the examples with the
> commands are generally OK and easy to follow, but a few things I noticed:
>
> 1. The general description of the section needs work. We may want to
> refine the description of the use cases, and in the warning, link to
> instructions on how to take backups.
> 2. We put the "case not supported" in the middle, not at the end.
> 3. The "generic steps for adding a new node..." section uses a
> convention for steps that is not found in the docs. We also don't
> provide an example for this section, and this is the most complicated
> scenario to set up.
>
> I may be able to propose some suggestions in a few days.
>
> > BTW, do you have any opinion on the idea of the first remaining patch
> > where we accomplish two things: a) Checks and throws an error if
> > 'copy_data = on' and 'origin = none' but the publication tables were
> > also replicated from other publishers. b) Adds 'force' value for
> > copy_data parameter to allow copying in such a case. The primary
> > reason for this patch is to avoid loops or duplicate data in the
> > initial phase. We can't skip copying based on origin as we can do
> > while replicating changes from WAL. So, we detect that the publisher
> > already has data from some other node and doesn't allow replication
> > unless the user uses the 'force' option for copy_data.
>
> In general, I agree with the patch; but 

Re: Handle infinite recursion in logical replication setup

2022-07-25 Thread Peter Smith
Firstly, I have some (case-sensitivity) questions about the previous
patch which was already pushed [1].

Q1. create_subscription docs

I did not understand why the docs refer to slot_name = NONE, yet the
newly added option says origin = none/any. I think that saying origin
= NONE/ANY would be more consistent with the existing usage of NONE in
this documentation.

~~~

Q2. parse_subscription_options

Similarly, in the code (parse_subscription_options), I did not
understand why the checks for special name values are implemented
differently:

The new 'origin' code is using pg_strcmpcase to check special values
(none/any), and the old 'slot_name' code uses case-sensitive strcmp to
check the special value (none).

FWIW, here I thought the new origin code is the correct one.

==

Now, here are some review comments for the patch v38-0001:

1. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed

@@ -1781,6 +1858,121 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
  table_close(rel, RowExclusiveLock);
 }

+/*
+ * Check and throw an error if the publisher has subscribed to the same table
+ * from some other publisher. This check is required only if copydata is ON and
+ * the origin is local for CREATE SUBSCRIPTION and
+ * ALTER SUBSCRIPTION ... REFRESH statements to avoid replicating remote data
+ * from the publisher.
+ *
+ * This check need not be performed on the tables that are already added as
+ * incremental sync for such tables will happen through WAL and the origin of
+ * the data can be identified from the WAL records.
+ *
+ * subrel_local_oids contains the list of relation oids that are already
+ * present on the subscriber.
+ */
+static void
+check_pub_table_subscribed(WalReceiverConn *wrconn, List *publications,
+CopyData copydata, char *origin,
+Oid *subrel_local_oids, int subrel_count)

1a.
"copydata is ON" --> "copy_data = on" (because the comment is talking
about the CREATE/ALTER statements, so it seemed a bit confusing to
refer to the copydata function param instead of the copy_data
subscription parameter)

1b.
"the origin is local" ?? But, "local" was the old special name value.
Now it is "none", so I think this part needs minor rewording.

~~~

2.

+ if (copydata != COPY_DATA_ON || !origin ||
+ (pg_strcasecmp(origin, "none") != 0))
+ return;

Should this be using the constant LOGICALREP_ORIGIN_NONE?

~~~

3.

+ /*
+ * Throw an error if the publisher has subscribed to the same table
+ * from some other publisher. We cannot differentiate between the
+ * origin and non-origin data that is present in the HEAP during the
+ * initial sync. Identification of non-origin data can be done only
+ * from the WAL by using the origin id.
+ *
+ * XXX: For simplicity, we don't check whether the table has any data
+ * or not. If the table doesn't have any data then we don't need to
+ * distinguish between local and non-local data so we can avoid
+ * throwing an error in that case.
+ */

3a.
When the special origin value changed from "local" to "none" this
comment's first part seems to have got a bit lost in translation.

SUGGESTION:
Throw an error if the publisher has subscribed to the same table from
some other publisher. We cannot know the origin of data during the
initial sync. Data origins can be found only from the WAL by looking
at the origin id.

3b.
I think referring to "local and non-local" data in the XXX part of
this comment also needs some minor rewording now that "local" is not a
special origin name anymore.

--
[1] 
https://github.com/postgres/postgres/commit/366283961ac0ed6d8901c6090f3fd02fce0a

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Handle infinite recursion in logical replication setup

2022-07-24 Thread Amit Kapila
On Sun, Jul 24, 2022 at 10:21 PM Jonathan S. Katz  wrote:
>
> On 7/22/22 12:47 AM, Amit Kapila wrote:
> > On Fri, Jul 22, 2022 at 1:39 AM Jonathan S. Katz  
> > wrote:
>
> >> 1. I'm concerned by calling this "Bidirectional replication" in the docs
> >> that we are overstating the current capabilities. I think this is
> >> accentuated int he opening paragraph:
> >>
> >> ==snip==
> >>Bidirectional replication is useful for creating a multi-master database
> >>environment for replicating read/write operations performed by any of 
> >> the
> >>member nodes.
> >> ==snip==
> >>
> >> For one, we're not replicating reads, we're replicating writes. Amongst
> >> the writes, at this point we're only replicating DML. A reader could
> >> think that deploying can work for a full bidirectional solution.
> >>
> >> (Even if we're aspirationally calling this section "Bidirectional
> >> replication", that does make it sound like we're limited to two nodes,
> >> when we can support more than two).
> >>
> >
> > Right, I think the system can support N-Way replication.
>
> I did some more testing of the feature, i.e. doing 3-node and 4-node
> tests. While logical replication today can handle replicating between
> multiple nodes (N-way), the "origin = none" does require setting up
> subscribers between each of the nodes.
>
> For example, if I have 4 nodes A, B, C, D and I want to replicate the
> same table between all of them, I need to set up subscribers between all
> of them (A<=>B, A<=>C, A<=>D, B<=>C, B<=>D, C<=>D). However, each node
> can replicate between each other in a way that's convenient (vs. having
> to do something funky with partitions) so this is still a big step forward.
>
> This is a long way of saying that I do think it's fair to say we support
> "N-way" replication so long as you are set up in a mesh (vs. a ring,
> e.g. A=>B=>C=>D=>A).
>

Sorry, but I don't get your point of mesh vs. ring? I think with some
care users can set up replication even in a ring topology.

> > Among the above "Replicating changes between primaries" sounds good to
> > me or simply "Replication between primaries". As this is a sub-section
> > on the Logical Replication page, I feel it is okay to not use Logical
> > in the title.
>
> Agreed, I think that's fine.
>
> >> At a minimum, I think we should reference the documentation we have in
> >> the logical replication section on conflicts. We may also want to advise
> >> that a user is responsible for designing their schemas in a way to
> >> minimize the risk of conflicts.
> >>
> >
> > This sounds reasonable to me.
> >
> > One more point about docs, it appears to be added as the last
> > sub-section on the Logical Replication page. Is there a reason for
> > doing so? I feel this should be third sub-section after describing
> > Publication and Subscription.
>
> When I first reviewed, I had not built the docs. Did so on this pass.
>
> I agree with the positioning argument, i.e. it should go after
> "Subscription" in the table of contents -- but it makes me question a
> couple of things:
>
> 1. The general ordering of the docs
> 2. How we describe that section (more on that in a sec)
> 3. If "row filters" should be part of "subscription" instead of its own
> section.
>

I don't think it is a good idea to keep "row filters" as a part of
subscription because we define those at publisher but there are
certain things like initial sync or combining of row filters that are
related to subscriptions. So, probably having it in a separate
sub-section seems okay to me. I have also thought about keeping it as
a part of Publication or Subscription but left it due to the reasons
mentioned.

> If you look at the current order, "Quick setup" is the last section; one
> would think the "quick" portion goes first :) Given a lot of this is for
> the current docs, I may start a separate discussion on -docs for this part.
>
> For the time being, I agree it should be moved to the section after
> "Subscription".
>

Okay, thanks!

> I think what this section describes is "Configuring Replication Between
> Nodes" as it covers a few different scenarios.
>
> I do think we need to iterate on these docs -- the examples with the
> commands are generally OK and easy to follow, but a few things I noticed:
>
> 1. The general description of the section needs work. We may want to
> refine the description of the use cases, and in the warning, link to
> instructions on how to take backups.
> 2. We put the "case not supported" in the middle, not at the end.
> 3. The "generic steps for adding a new node..." section uses a
> convention for steps that is not found in the docs. We also don't
> provide an example for this section, and this is the most complicated
> scenario to set up.
>

I agree with all these points, especially your point related to a
complicated setup. This won't be easy for users without a very clear
description and examples. However, after this work, it will at least
be possible for users to set up an

Re: Handle infinite recursion in logical replication setup

2022-07-24 Thread Jonathan S. Katz

On 7/22/22 12:47 AM, Amit Kapila wrote:

On Fri, Jul 22, 2022 at 1:39 AM Jonathan S. Katz  wrote:



1. I'm concerned by calling this "Bidirectional replication" in the docs
that we are overstating the current capabilities. I think this is
accentuated int he opening paragraph:

==snip==
   Bidirectional replication is useful for creating a multi-master database
   environment for replicating read/write operations performed by any of the
   member nodes.
==snip==

For one, we're not replicating reads, we're replicating writes. Amongst
the writes, at this point we're only replicating DML. A reader could
think that deploying can work for a full bidirectional solution.

(Even if we're aspirationally calling this section "Bidirectional
replication", that does make it sound like we're limited to two nodes,
when we can support more than two).



Right, I think the system can support N-Way replication.


I did some more testing of the feature, i.e. doing 3-node and 4-node 
tests. While logical replication today can handle replicating between 
multiple nodes (N-way), the "origin = none" does require setting up 
subscribers between each of the nodes.


For example, if I have 4 nodes A, B, C, D and I want to replicate the 
same table between all of them, I need to set up subscribers between all 
of them (A<=>B, A<=>C, A<=>D, B<=>C, B<=>D, C<=>D). However, each node 
can replicate between each other in a way that's convenient (vs. having 
to do something funky with partitions) so this is still a big step forward.


This is a long way of saying that I do think it's fair to say we support 
"N-way" replication so long as you are set up in a mesh (vs. a ring, 
e.g. A=>B=>C=>D=>A).



Among the above "Replicating changes between primaries" sounds good to
me or simply "Replication between primaries". As this is a sub-section
on the Logical Replication page, I feel it is okay to not use Logical
in the title.


Agreed, I think that's fine.


At a minimum, I think we should reference the documentation we have in
the logical replication section on conflicts. We may also want to advise
that a user is responsible for designing their schemas in a way to
minimize the risk of conflicts.



This sounds reasonable to me.

One more point about docs, it appears to be added as the last
sub-section on the Logical Replication page. Is there a reason for
doing so? I feel this should be third sub-section after describing
Publication and Subscription.


When I first reviewed, I had not built the docs. Did so on this pass.

I agree with the positioning argument, i.e. it should go after 
"Subscription" in the table of contents -- but it makes me question a 
couple of things:


1. The general ordering of the docs
2. How we describe that section (more on that in a sec)
3. If "row filters" should be part of "subscription" instead of its own 
section.


If you look at the current order, "Quick setup" is the last section; one 
would think the "quick" portion goes first :) Given a lot of this is for 
the current docs, I may start a separate discussion on -docs for this part.


For the time being, I agree it should be moved to the section after 
"Subscription".


I think what this section describes is "Configuring Replication Between 
Nodes" as it covers a few different scenarios.


I do think we need to iterate on these docs -- the examples with the 
commands are generally OK and easy to follow, but a few things I noticed:


1. The general description of the section needs work. We may want to 
refine the description of the use cases, and in the warning, link to 
instructions on how to take backups.

2. We put the "case not supported" in the middle, not at the end.
3. The "generic steps for adding a new node..." section uses a 
convention for steps that is not found in the docs. We also don't 
provide an example for this section, and this is the most complicated 
scenario to set up.


I may be able to propose some suggestions in a few days.


BTW, do you have any opinion on the idea of the first remaining patch
where we accomplish two things: a) Checks and throws an error if
'copy_data = on' and 'origin = none' but the publication tables were
also replicated from other publishers. b) Adds 'force' value for
copy_data parameter to allow copying in such a case. The primary
reason for this patch is to avoid loops or duplicate data in the
initial phase. We can't skip copying based on origin as we can do
while replicating changes from WAL. So, we detect that the publisher
already has data from some other node and doesn't allow replication
unless the user uses the 'force' option for copy_data.


In general, I agree with the patch; but I'm not sure why we are calling 
"copy_data = force" in this case and how it varies from "on". I 
understand the goal is to prevent the infinite loop, but is there some 
technical restriction why we can't set "origin = none, copy_data = on" 
and have this work (and apologies if I missed that upthread)?


The other concern I'll

  1   2   3   >