RE: Optionally automatically disable logical replication subscriptions on error

2022-03-14 Thread osumi.takami...@fujitsu.com
On Tuesday, March 15, 2022 8:04 AM Nathan Bossart  
wrote:
> My compiler is worried that syncslotname may be used uninitialized in
> start_table_sync().  The attached patch seems to silence this warning.
Thank you for your reporting !

Your fix looks good to me.


Best Regards,
Takamichi Osumi





Re: Optionally automatically disable logical replication subscriptions on error

2022-03-14 Thread Nathan Bossart
My compiler is worried that syncslotname may be used uninitialized in
start_table_sync().  The attached patch seems to silence this warning.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 05e4e03af5afa1658ede8d78b31c1c999b5c7deb Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 14 Mar 2022 16:00:18 -0700
Subject: [PATCH v1 1/1] silence compiler warning

---
 src/backend/replication/logical/worker.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index a1fe81b34f..03e069c7cd 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -3387,7 +3387,7 @@ TwoPhaseTransactionGid(Oid subid, TransactionId xid, char *gid, int szgid)
 static void
 start_table_sync(XLogRecPtr *origin_startpos, char **myslotname)
 {
-	char	   *syncslotname;
+	char	   *syncslotname = NULL;
 
 	Assert(am_tablesync_worker());
 
-- 
2.25.1



RE: Optionally automatically disable logical replication subscriptions on error

2022-03-14 Thread osumi.takami...@fujitsu.com
On Monday, March 14, 2022 7:49 PM Amit Kapila  wrote:
> On Thu, Mar 10, 2022 at 12:04 PM Amit Kapila 
> wrote:
> >
> > On Wed, Mar 9, 2022 at 7:57 PM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > Hi, attached v32 removed my additional code for
> maybe_reread_subscription.
> > >
> >
> > Thanks, the patch looks good to me. I have made minor edits in the
> > attached. I am planning to commit this early next week (Monday) unless
> > there are any other major comments.
> >
> 
> Pushed.
Thank you so much !


Best Regards,
Takamichi Osumi



Re: Optionally automatically disable logical replication subscriptions on error

2022-03-14 Thread Amit Kapila
On Thu, Mar 10, 2022 at 12:04 PM Amit Kapila  wrote:
>
> On Wed, Mar 9, 2022 at 7:57 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > Hi, attached v32 removed my additional code for maybe_reread_subscription.
> >
>
> Thanks, the patch looks good to me. I have made minor edits in the
> attached. I am planning to commit this early next week (Monday) unless
> there are any other major comments.
>

Pushed.

-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-09 Thread Amit Kapila
On Wed, Mar 9, 2022 at 7:57 PM osumi.takami...@fujitsu.com
 wrote:
>
> Hi, attached v32 removed my additional code for maybe_reread_subscription.
>

Thanks, the patch looks good to me. I have made minor edits in the
attached. I am planning to commit this early next week (Monday) unless
there are any other major comments.

-- 
With Regards,
Amit Kapila.


v33-0001-Optionally-disable-subscriptions-on-error.patch
Description: Binary data


RE: Optionally automatically disable logical replication subscriptions on error

2022-03-09 Thread osumi.takami...@fujitsu.com
On Wednesday, March 9, 2022 8:22 PM Amit Kapila  wrote:
> On Wed, Mar 9, 2022 at 2:21 PM Masahiko Sawada
>  wrote:
> >
> > On Wed, Mar 9, 2022 at 4:33 PM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > On Tuesday, March 8, 2022 10:23 PM Amit Kapila
>  wrote:
> > > > On Tue, Mar 8, 2022 at 1:37 PM osumi.takami...@fujitsu.com
> > > >  wrote:
> > > > >
> > >
> > >
> > > > 2. Is there a reason the patch doesn't allow workers to restart
> > > > via
> > > > maybe_reread_subscription() when this new option is changed, if
> > > > so, then let's add a comment for the same? We currently seem to be
> > > > restarting the worker on any change via Alter Subscription. If we
> > > > decide to change it for this option as well then I think we need
> > > > to accordingly update the current comment: "Exit if any parameter
> > > > that affects the remote connection was changed." to something like
> > > > "Exit if any parameter that affects the remote connection or a
> subscription option was changed..."
> > > I thought it's ok without the change at the beginning, but I was wrong.
> > > To make this new option aligned with others, I should add one check
> > > for this feature. Fixed.
> >
> > Why do we need to restart the apply worker when disable_on_error is
> > changed? It doesn't affect the remote connection at all. I think it
> > can be changed without restarting like synchronous_commit option.
> >
> 
> oh right, I thought that how will we update its value in MySubscription after 
> a
> change but as we re-read the pg_subscription table for the current
> subscription and update MySubscription, I feel we don't need to restart it. I
> haven't tested it but it should work without a restart.
Hi, attached v32 removed my additional code for maybe_reread_subscription.

Also, I judged that we don't need to add a comment for this feature in this 
patch.
It's because we can interpret this discussion from existing comments and codes.
(1) "Reread subscription info if needed. Most changes will be exit."
There are some cases we don't exit.
(2) Like "Exit if any parameter that affects the remote connection was 
changed.",
readers can understand no exit case matches the disable_on_error option 
change.

Kindly review the v32.

Best Regards,
Takamichi Osumi



v32-0001-Optionally-disable-subscriptions-on-error.patch
Description: v32-0001-Optionally-disable-subscriptions-on-error.patch


Re: Optionally automatically disable logical replication subscriptions on error

2022-03-09 Thread Amit Kapila
On Wed, Mar 9, 2022 at 2:21 PM Masahiko Sawada  wrote:
>
> On Wed, Mar 9, 2022 at 4:33 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Tuesday, March 8, 2022 10:23 PM Amit Kapila  
> > wrote:
> > > On Tue, Mar 8, 2022 at 1:37 PM osumi.takami...@fujitsu.com
> > >  wrote:
> > > >
> >
> >
> > > 2. Is there a reason the patch doesn't allow workers to restart via
> > > maybe_reread_subscription() when this new option is changed, if so, then 
> > > let's
> > > add a comment for the same? We currently seem to be restarting the worker 
> > > on
> > > any change via Alter Subscription. If we decide to change it for this 
> > > option as
> > > well then I think we need to accordingly update the current comment: 
> > > "Exit if
> > > any parameter that affects the remote connection was changed." to 
> > > something
> > > like "Exit if any parameter that affects the remote connection or a 
> > > subscription
> > > option was changed..."
> > I thought it's ok without the change at the beginning, but I was wrong.
> > To make this new option aligned with others, I should add one check
> > for this feature. Fixed.
>
> Why do we need to restart the apply worker when disable_on_error is
> changed? It doesn't affect the remote connection at all. I think it
> can be changed without restarting like synchronous_commit option.
>

oh right, I thought that how will we update its value in
MySubscription after a change but as we re-read the pg_subscription
table for the current subscription and update MySubscription, I feel
we don't need to restart it. I haven't tested it but it should work
without a restart.

-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-09 Thread Masahiko Sawada
On Wed, Mar 9, 2022 at 4:33 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Tuesday, March 8, 2022 10:23 PM Amit Kapila  
> wrote:
> > On Tue, Mar 8, 2022 at 1:37 PM osumi.takami...@fujitsu.com
> >  wrote:
> > >
>
>
> > 2. Is there a reason the patch doesn't allow workers to restart via
> > maybe_reread_subscription() when this new option is changed, if so, then 
> > let's
> > add a comment for the same? We currently seem to be restarting the worker on
> > any change via Alter Subscription. If we decide to change it for this 
> > option as
> > well then I think we need to accordingly update the current comment: "Exit 
> > if
> > any parameter that affects the remote connection was changed." to something
> > like "Exit if any parameter that affects the remote connection or a 
> > subscription
> > option was changed..."
> I thought it's ok without the change at the beginning, but I was wrong.
> To make this new option aligned with others, I should add one check
> for this feature. Fixed.

Why do we need to restart the apply worker when disable_on_error is
changed? It doesn't affect the remote connection at all. I think it
can be changed without restarting like synchronous_commit option.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




RE: Optionally automatically disable logical replication subscriptions on error

2022-03-08 Thread osumi.takami...@fujitsu.com
On Tuesday, March 8, 2022 10:23 PM Amit Kapila  wrote:
> On Tue, Mar 8, 2022 at 1:37 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > Kindly have a look at v30.
> >
> 
> Review comments:
Thank you for checking !


> ===
> 1.
> + ereport(LOG,
> + errmsg("logical replication subscription \"%s\" has been be disabled
> due to an error",
> 
> Typo.
> /been be/been
Fixed.

 
> 2. Is there a reason the patch doesn't allow workers to restart via
> maybe_reread_subscription() when this new option is changed, if so, then let's
> add a comment for the same? We currently seem to be restarting the worker on
> any change via Alter Subscription. If we decide to change it for this option 
> as
> well then I think we need to accordingly update the current comment: "Exit if
> any parameter that affects the remote connection was changed." to something
> like "Exit if any parameter that affects the remote connection or a 
> subscription
> option was changed..."
I thought it's ok without the change at the beginning, but I was wrong.
To make this new option aligned with others, I should add one check
for this feature. Fixed.


> 3.
>   if (fout->remoteVersion >= 15)
> - appendPQExpBufferStr(query, " s.subtwophasestate\n");
> + appendPQExpBufferStr(query, " s.subtwophasestate,\n");
>   else
>   appendPQExpBuffer(query,
> -   " '%c' AS subtwophasestate\n",
> +   " '%c' AS subtwophasestate,\n",
> LOGICALREP_TWOPHASE_STATE_DISABLED);
> 
> + if (fout->remoteVersion >= 15)
> + appendPQExpBuffer(query, " s.subdisableonerr\n"); else
> + appendPQExpBuffer(query,
> +   " false AS subdisableonerr\n");
> 
> It is better to combine these parameters. I see there is a similar coding 
> pattern
> for 14 but I think that is not required.
Fixed and combined them together.

 
> 4.
> +$node_subscriber->safe_psql('postgres', qq(ALTER SUBSCRIPTION sub
> +ENABLE));
> +
> +# Wait for the data to replicate.
> +$node_subscriber->poll_query_until(
> + 'postgres', qq(
> +SELECT COUNT(1) = 1 FROM pg_catalog.pg_subscription_rel sr WHERE
> +sr.srsubstate IN ('s', 'r') AND sr.srrelid = 'tbl'::regclass));
> 
> See other scripts like t/015_stream.pl and wait for data replication in the 
> same
> way. I think there are two things to change: (a) In the above query, we use 
> NOT
> IN at other places (b) use $node_publisher->wait_for_catchup before this
> query.
Fixed.

The new patch is shared in [1].

[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373824855A6C4D2178027A0ED0A9%40TYCPR01MB8373.jpnprd01.prod.outlook.com


Best Regards,
Takamichi Osumi



RE: Optionally automatically disable logical replication subscriptions on error

2022-03-08 Thread osumi.takami...@fujitsu.com
On Wednesday, March 9, 2022 9:58 AM Masahiko Sawada  
wrote:
> On Tue, Mar 8, 2022 at 5:07 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > Kindly have a look at v30.
> 
> Thank you for updating the patch. Here are some comments:
Hi, thank you for your review !


> +   /*
> +* Allocate the origin name in long-lived context for error context
> +* message.
> +*/
> +   ReplicationOriginNameForTablesync(MySubscription->oid,
> + MyLogicalRepWorker->relid,
> + originname,
> + sizeof(originname));
> +   apply_error_callback_arg.origin_name =
> MemoryContextStrdup(ApplyContext,
> +
> + originname);
> 
> I think it's better to set apply_error_callback_arg.origin_name in the caller
> rather than in start_table_sync(). Apply workers set
> apply_error_callback_arg.origin_name there and it's not necessarily necessary
> to do that in this function.
OK. I made this origin_name logic back to the level of ApplyWorkerMain.


The new patch v31 is shared in [1].


[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373824855A6C4D2178027A0ED0A9%40TYCPR01MB8373.jpnprd01.prod.outlook.com


Best Regardfs,
Takamichi Osumi



RE: Optionally automatically disable logical replication subscriptions on error

2022-03-08 Thread osumi.takami...@fujitsu.com
On Wednesday, March 9, 2022 1:29 PM Amit Kapila  wrote:
> On Tue, Mar 8, 2022 at 6:53 PM Amit Kapila  wrote:
> >
> > On Tue, Mar 8, 2022 at 1:37 PM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > Kindly have a look at v30.
> > >
> >
> > Review comments:
> > ===
Thank you for reviewing !


> Few comments on test script:
> ===
> 1.
> +# This tests the uniqueness violation will cause the subscription # to
> +fail during initial synchronization and make it disabled.
> 
> /This tests the/This tests that the
Fixed.


> 2.
> +$node_publisher->safe_psql('postgres',
> + qq(CREATE PUBLICATION pub FOR TABLE tbl));
> +$node_subscriber->safe_psql(  'postgres', qq( CREATE SUBSCRIPTION
> sub
> +CONNECTION '$publisher_connstr'
> +PUBLICATION pub WITH (disable_on_error = true)));
> 
> Please check other test scripts like t/015_stream.pl or t/028_row_filter.pl 
> and
> keep the indentation of these commands similar. It looks odd and inconsistent
> with other tests. Also, we can use double-quotes instead of qq so as to be
> consistent with other scripts. Please check other similar places and make
> them consistent with other test script files.
Fixed the inconsistent indentations within each commands.
Also, replace the qq with double-quotes (except for the is()'s
2nd argument, which is the aligned way to write the tests).



> 3.
> +# Initial synchronization failure causes the subscription # to be
> +disabled.
> 
> Here and in other places in test scripts, the comment lines seem too short to
> me. Normally, we can keep it at the 80 char limit but this appears too short.
Fixed.


> 4.
> +# Delete the data from the subscriber and recreate the unique index.
> +$node_subscriber->safe_psql(
> + 'postgres', q(
> +DELETE FROM tbl;
> +CREATE UNIQUE INDEX tbl_unique ON tbl (i)));
> 
> In other tests, we are executing single statements via safe_psql. I don't see 
> a
> problem with this but also don't see a reason to deviate from the normal
> pattern.
Fixed.


At the same time, I fixed one comment
where I should write "subscriber", not "sub",
since in the entire test file, I express the subscriber
by using the former.


The new patch v31 is shared in [1].

[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373824855A6C4D2178027A0ED0A9%40TYCPR01MB8373.jpnprd01.prod.outlook.com


Best Regards,
Takamichi Osumi



RE: Optionally automatically disable logical replication subscriptions on error

2022-03-08 Thread osumi.takami...@fujitsu.com
On Wednesday, March 9, 2022 3:02 PM Amit Kapila  wrote:
> On Wed, Mar 9, 2022 at 11:22 AM Masahiko Sawada
>  wrote:
> >
> > On Wed, Mar 9, 2022 at 12:37 PM Amit Kapila 
> wrote:
> > >
> > > On Wed, Mar 9, 2022 at 6:29 AM Masahiko Sawada
>  wrote:
> > > >
> > > > ---
> > > > It might have already been discussed but the worker disables the
> > > > subscription on an error but doesn't work for a fatal. Is that
> > > > expected or should we handle that too?
> > > >
> > >
> > > I am not too sure about handling FATALs with this feature because
> > > this is mainly to aid in resolving conflicts due to various
> > > constraints. It might be okay to retry in case of FATAL which is
> > > possibly due to some system resource error. OTOH, if we see that it
> > > will be good to disable for a FATAL error as well then I think we
> > > can use PG_ENSURE_ERROR_CLEANUP construct. What do you think?
> >
> > I think that since FATAL raised by logical replication workers (e.g.,
> > terminated by DDL or out of memory etc?) is normally not a repeatable
> > error, it's reasonable to retry in this case.
> >
> 
> Yeah, I think we can add a comment in the code for this so that future readers
> know that this has been done deliberately.
OK. I've added some comments in the codes.

The v31 addressed other comments on hackers so far.
(a) brush up the TAP test alignment
(b) fix the place of apply_error_callback_arg.origin_name for table sync worker
(c) modify maybe_reread_subscription to exit, when disable_on_error changes
(d) improve getSubscriptions to combine some branches for v15

Kindly check the attached v31.


Best Regards,
Takamichi Osumi



v31-0001-Optionally-disable-subscriptions-on-error.patch
Description: v31-0001-Optionally-disable-subscriptions-on-error.patch


Re: Optionally automatically disable logical replication subscriptions on error

2022-03-08 Thread Amit Kapila
On Wed, Mar 9, 2022 at 11:22 AM Masahiko Sawada  wrote:
>
> On Wed, Mar 9, 2022 at 12:37 PM Amit Kapila  wrote:
> >
> > On Wed, Mar 9, 2022 at 6:29 AM Masahiko Sawada  
> > wrote:
> > >
> > > ---
> > > It might have already been discussed but the worker disables the
> > > subscription on an error but doesn't work for a fatal. Is that
> > > expected or should we handle that too?
> > >
> >
> > I am not too sure about handling FATALs with this feature because this
> > is mainly to aid in resolving conflicts due to various constraints. It
> > might be okay to retry in case of FATAL which is possibly due to some
> > system resource error. OTOH, if we see that it will be good to disable
> > for a FATAL error as well then I think we can use
> > PG_ENSURE_ERROR_CLEANUP construct. What do you think?
>
> I think that since FATAL raised by logical replication workers (e.g.,
> terminated by DDL or out of memory etc?) is normally not a repeatable
> error, it's reasonable to retry in this case.
>

Yeah, I think we can add a comment in the code for this so that future
readers know that this has been done deliberately.

-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-08 Thread Masahiko Sawada
On Wed, Mar 9, 2022 at 12:37 PM Amit Kapila  wrote:
>
> On Wed, Mar 9, 2022 at 6:29 AM Masahiko Sawada  wrote:
> >
> > ---
> > It might have already been discussed but the worker disables the
> > subscription on an error but doesn't work for a fatal. Is that
> > expected or should we handle that too?
> >
>
> I am not too sure about handling FATALs with this feature because this
> is mainly to aid in resolving conflicts due to various constraints. It
> might be okay to retry in case of FATAL which is possibly due to some
> system resource error. OTOH, if we see that it will be good to disable
> for a FATAL error as well then I think we can use
> PG_ENSURE_ERROR_CLEANUP construct. What do you think?

I think that since FATAL raised by logical replication workers (e.g.,
terminated by DDL or out of memory etc?) is normally not a repeatable
error, it's reasonable to retry in this case.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-08 Thread Amit Kapila
On Tue, Mar 8, 2022 at 6:53 PM Amit Kapila  wrote:
>
> On Tue, Mar 8, 2022 at 1:37 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > Kindly have a look at v30.
> >
>
> Review comments:
> ===
>
Few comments on test script:
===
1.
+# This tests the uniqueness violation will cause the subscription
+# to fail during initial synchronization and make it disabled.

/This tests the/This tests that the

2.
+$node_publisher->safe_psql('postgres',
+ qq(CREATE PUBLICATION pub FOR TABLE tbl));
+$node_subscriber->safe_psql(
+ 'postgres', qq(
+CREATE SUBSCRIPTION sub
+CONNECTION '$publisher_connstr'
+PUBLICATION pub WITH (disable_on_error = true)));

Please check other test scripts like t/015_stream.pl or
t/028_row_filter.pl and keep the indentation of these commands
similar. It looks odd and inconsistent with other tests. Also, we can
use double-quotes instead of qq so as to be consistent with other
scripts. Please check other similar places and make them consistent
with other test script files.

3.
+# Initial synchronization failure causes the subscription
+# to be disabled.

Here and in other places in test scripts, the comment lines seem too
short to me. Normally, we can keep it at the 80 char limit but this
appears too short.

4.
+# Delete the data from the subscriber and recreate the unique index.
+$node_subscriber->safe_psql(
+ 'postgres', q(
+DELETE FROM tbl;
+CREATE UNIQUE INDEX tbl_unique ON tbl (i)));

In other tests, we are executing single statements via safe_psql. I
don't see a problem with this but also don't see a reason to deviate
from the normal pattern.

-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-08 Thread Amit Kapila
On Wed, Mar 9, 2022 at 6:29 AM Masahiko Sawada  wrote:
>
> ---
> It might have already been discussed but the worker disables the
> subscription on an error but doesn't work for a fatal. Is that
> expected or should we handle that too?
>

I am not too sure about handling FATALs with this feature because this
is mainly to aid in resolving conflicts due to various constraints. It
might be okay to retry in case of FATAL which is possibly due to some
system resource error. OTOH, if we see that it will be good to disable
for a FATAL error as well then I think we can use
PG_ENSURE_ERROR_CLEANUP construct. What do you think?

-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-08 Thread Masahiko Sawada
On Tue, Mar 8, 2022 at 5:07 PM osumi.takami...@fujitsu.com
 wrote:
>
> Kindly have a look at v30.

Thank you for updating the patch. Here are some comments:

+   /*
+* Allocate the origin name in long-lived context for error context
+* message.
+*/
+   ReplicationOriginNameForTablesync(MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ originname,
+ sizeof(originname));
+   apply_error_callback_arg.origin_name = MemoryContextStrdup(ApplyContext,
+  originname);

I think it's better to set apply_error_callback_arg.origin_name in the
caller rather than in start_table_sync(). Apply workers set
apply_error_callback_arg.origin_name there and it's not necessarily
necessary to do that in this function.

Even if we want to do that, I think it's not necessary to pass
originname to start_table_sync(). It's a local variable and used only
to temporarily store the tablesync worker's origin name.

---
It might have already been discussed but the worker disables the
subscription on an error but doesn't work for a fatal. Is that
expected or should we handle that too?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-08 Thread Amit Kapila
On Tue, Mar 8, 2022 at 1:37 PM osumi.takami...@fujitsu.com
 wrote:
>
> Kindly have a look at v30.
>

Review comments:
===
1.
+ ereport(LOG,
+ errmsg("logical replication subscription \"%s\" has been be disabled
due to an error",

Typo.
/been be/been

2. Is there a reason the patch doesn't allow workers to restart via
maybe_reread_subscription() when this new option is changed, if so,
then let's add a comment for the same? We currently seem to be
restarting the worker on any change via Alter Subscription. If we
decide to change it for this option as well then I think we need to
accordingly update the current comment: "Exit if any parameter that
affects the remote connection was changed." to something like "Exit if
any parameter that affects the remote connection or a subscription
option was changed..."

3.
  if (fout->remoteVersion >= 15)
- appendPQExpBufferStr(query, " s.subtwophasestate\n");
+ appendPQExpBufferStr(query, " s.subtwophasestate,\n");
  else
  appendPQExpBuffer(query,
-   " '%c' AS subtwophasestate\n",
+   " '%c' AS subtwophasestate,\n",
LOGICALREP_TWOPHASE_STATE_DISABLED);

+ if (fout->remoteVersion >= 15)
+ appendPQExpBuffer(query, " s.subdisableonerr\n");
+ else
+ appendPQExpBuffer(query,
+   " false AS subdisableonerr\n");

It is better to combine these parameters. I see there is a similar
coding pattern for 14 but I think that is not required.

4.
+$node_subscriber->safe_psql('postgres', qq(ALTER SUBSCRIPTION sub ENABLE));
+
+# Wait for the data to replicate.
+$node_subscriber->poll_query_until(
+ 'postgres', qq(
+SELECT COUNT(1) = 1 FROM pg_catalog.pg_subscription_rel sr
+WHERE sr.srsubstate IN ('s', 'r') AND sr.srrelid = 'tbl'::regclass));

See other scripts like t/015_stream.pl and wait for data replication
in the same way. I think there are two things to change: (a) In the
above query, we use NOT IN at other places (b) use
$node_publisher->wait_for_catchup before this query.

-- 
With Regards,
Amit Kapila.




RE: Optionally automatically disable logical replication subscriptions on error

2022-03-08 Thread osumi.takami...@fujitsu.com
On Tuesday, March 8, 2022 1:07 PM Peter Smith  wrote:
> Please find below some review comments for v29.
Thank you for your comments !


 
> ==
> 
> 1. src/backend/replication/logical/worker.c - worker_post_error_processing
> 
> +/*
> + * Abort and cleanup the current transaction, then do post-error processing.
> + * This function must be called in a PG_CATCH() block.
> + */
> +static void
> +worker_post_error_processing(void)
> 
> The function comment and function name are too vague/generic. I guess this is
> a hang-over from Sawada-san's proposed patch, but now since this is only
> called when disabling the subscription so both the comment and the function
> name should say that's what it is doing...
> 
> e.g. rename to DisableSubscriptionOnError() or something similar.
Fixed the comments and the function name in v30 shared in [1].




> ~~~
> 
> 2. src/backend/replication/logical/worker.c - worker_post_error_processing
> 
> + /* Notify the subscription has been disabled */ ereport(LOG,
> + errmsg("logical replication subscription \"%s\" has been be disabled
> due to an error",
> +MySubscription->name));
> 
>   proc_exit(0);
>  }
> 
> I know this is common code, but IMO it would be better to do the proc_exit(0);
> from the caller in the PG_CATCH. Then I think the code will be much easier to
> read the throw/exit logic, rather than now where it is just calling some 
> function
> that never returns...
> 
> Alternatively, if you want the code how it is, then the function name should 
> give
> some hint that it is never going to return - e.g.
> DisableSubscriptionOnErrorAndExit)
I renamed it to DisableSubscriptionAndExit in the end
according to the discussion.



> ~~~
> 
> 3. src/backend/replication/logical/worker.c - start_table_sync
> 
> + {
> + /*
> + * Abort the current transaction so that we send the stats message
> + * in an idle state.
> + */
> + AbortOutOfAnyTransaction();
> +
> + /* Report the worker failed during table synchronization */
> + pgstat_report_subscription_error(MySubscription->oid, false);
> +
> + PG_RE_THROW();
> + }
> 
> (This is a repeat of a previous comment from [1] comment #2)
> 
> I felt the separation of those 2 statements and comments makes the code less
> clean than it could/should be. IMO they should be grouped together.
> 
> SUGGESTED
> 
> /*
> * Report the worker failed during table synchronization. Abort the
> * current transaction so that the stats message is sent in an idle
> * state.
> */
> AbortOutOfAnyTransaction();
> pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_work
> er());
Fixed.



> ~~~
> 
> 4. src/backend/replication/logical/worker.c - start_apply
> 
> + {
> + /*
> + * Abort the current transaction so that we send the stats message
> + * in an idle state.
> + */
> + AbortOutOfAnyTransaction();
> +
> + /* Report the worker failed while applying changes */
> + pgstat_report_subscription_error(MySubscription->oid,
> + !am_tablesync_worker());
> +
> + PG_RE_THROW();
> + }
> 
> (same as #3 but comment says "while applying changes")
> 
> SUGGESTED
> 
> /*
> * Report the worker failed while applying changing. Abort the current
> * transaction so that the stats message is sent in an idle state.
> */
> AbortOutOfAnyTransaction();
> pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_work
> er());
Fixed. I choose the woring "while applying changes" which you mentioned first
and sounds more natural.


[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373B74627C6BAF2F146D779ED099%40TYCPR01MB8373.jpnprd01.prod.outlook.com


Best Regards,
Takamichi Osumi



RE: Optionally automatically disable logical replication subscriptions on error

2022-03-08 Thread osumi.takami...@fujitsu.com
On Tuesday, March 8, 2022 2:52 PM Amit Kapila  wrote:
> On Tue, Mar 8, 2022 at 9:37 AM Peter Smith  wrote:
> >
> > Please find below some review comments for v29.
> >
> > ==
> >
> > 1. src/backend/replication/logical/worker.c -
> > worker_post_error_processing
> >
> > +/*
> > + * Abort and cleanup the current transaction, then do post-error 
> > processing.
> > + * This function must be called in a PG_CATCH() block.
> > + */
> > +static void
> > +worker_post_error_processing(void)
> >
> > The function comment and function name are too vague/generic. I guess
> > this is a hang-over from Sawada-san's proposed patch, but now since
> > this is only called when disabling the subscription so both the
> > comment and the function name should say that's what it is doing...
> >
> > e.g. rename to DisableSubscriptionOnError() or something similar.
> >
> > ~~~
> >
> > 2. src/backend/replication/logical/worker.c -
> > worker_post_error_processing
> >
> > + /* Notify the subscription has been disabled */ ereport(LOG,
> > + errmsg("logical replication subscription \"%s\" has been be disabled
> > due to an error",
> > +MySubscription->name));
> >
> >   proc_exit(0);
> >  }
> >
> > I know this is common code, but IMO it would be better to do the
> > proc_exit(0); from the caller in the PG_CATCH. Then I think the code
> > will be much easier to read the throw/exit logic, rather than now
> > where it is just calling some function that never returns...
> >
> > Alternatively, if you want the code how it is, then the function name
> > should give some hint that it is never going to return - e.g.
> > DisableSubscriptionOnErrorAndExit)
> >
> 
> I think we are already in error so maybe it is better to name it as
> DisableSubscriptionAndExit.
OK. Renamed.


 
> Few other comments:
> =
> 1.
> DisableSubscription()
> {
> ..
> +
> + LockSharedObject(SubscriptionRelationId, subid, 0,
> + AccessExclusiveLock);
> 
> Why do we need AccessExclusiveLock here? The Alter/Drop Subscription
> takes AccessExclusiveLock, so AccessShareLock should be sufficient unless
> we have a reason to use AccessExclusiveLock lock. The other similar usages in
> this file (pg_subscription.c) also take AccessShareLock.
Fixed.

 
> 2. Shall we mention this feature in conflict handling docs [1]:
> Now:
> To skip the transaction, the subscription needs to be disabled temporarily by
> ALTER SUBSCRIPTION ... DISABLE first.
> 
> After:
> To skip the transaction, the subscription needs to be disabled temporarily by
> ALTER SUBSCRIPTION ... DISABLE first or alternatively, the subscription can
> be used with the disable_on_error option.
> 
> Feel free to use something on the above lines, if you agree.
Agreed. Fixed.

At the same time, the attached v30 has incorporated
some rebase results of recent commit(d3e8368)
so that start_table_sync allocates the origin names
in long-lived context. Accoring to this, I modified
some comments on this function.

I made some comments for sending stats in
start_table_sync and start_apply united and concise,
which were pointed out by Peter Smith in [1].

[1] - 
https://www.postgresql.org/message-id/CAHut%2BPs3b8HjsVyo-Aygtnxbw1PiVOC9nvrN6dKxYtS4C8s%2Bgw%40mail.gmail.com

Kindly have a look at v30.

Best Regards,
Takamichi Osumi



v30-0001-Optionally-disable-subscriptions-on-error.patch
Description: v30-0001-Optionally-disable-subscriptions-on-error.patch


Re: Optionally automatically disable logical replication subscriptions on error

2022-03-07 Thread Amit Kapila
On Tue, Mar 8, 2022 at 9:37 AM Peter Smith  wrote:
>
> Please find below some review comments for v29.
>
> ==
>
> 1. src/backend/replication/logical/worker.c - worker_post_error_processing
>
> +/*
> + * Abort and cleanup the current transaction, then do post-error processing.
> + * This function must be called in a PG_CATCH() block.
> + */
> +static void
> +worker_post_error_processing(void)
>
> The function comment and function name are too vague/generic. I guess
> this is a hang-over from Sawada-san's proposed patch, but now since
> this is only called when disabling the subscription so both the
> comment and the function name should say that's what it is doing...
>
> e.g. rename to DisableSubscriptionOnError() or something similar.
>
> ~~~
>
> 2. src/backend/replication/logical/worker.c - worker_post_error_processing
>
> + /* Notify the subscription has been disabled */
> + ereport(LOG,
> + errmsg("logical replication subscription \"%s\" has been be disabled
> due to an error",
> +MySubscription->name));
>
>   proc_exit(0);
>  }
>
> I know this is common code, but IMO it would be better to do the
> proc_exit(0); from the caller in the PG_CATCH. Then I think the code
> will be much easier to read the throw/exit logic, rather than now
> where it is just calling some function that never returns...
>
> Alternatively, if you want the code how it is, then the function name
> should give some hint that it is never going to return - e.g.
> DisableSubscriptionOnErrorAndExit)
>

I think we are already in error so maybe it is better to name it as
DisableSubscriptionAndExit.

Few other comments:
=
1.
DisableSubscription()
{
..
+
+ LockSharedObject(SubscriptionRelationId, subid, 0, AccessExclusiveLock);

Why do we need AccessExclusiveLock here? The Alter/Drop Subscription
takes AccessExclusiveLock, so AccessShareLock should be sufficient
unless we have a reason to use AccessExclusiveLock lock. The other
similar usages in this file (pg_subscription.c) also take
AccessShareLock.

2. Shall we mention this feature in conflict handling docs [1]:
Now:
To skip the transaction, the subscription needs to be disabled
temporarily by ALTER SUBSCRIPTION ... DISABLE first.

After:
To skip the transaction, the subscription needs to be disabled
temporarily by ALTER SUBSCRIPTION ... DISABLE first or alternatively,
the subscription can be used with the disable_on_error option.

Feel free to use something on the above lines, if you agree.

-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2022-03-07 Thread Peter Smith
Please find below some review comments for v29.

==

1. src/backend/replication/logical/worker.c - worker_post_error_processing

+/*
+ * Abort and cleanup the current transaction, then do post-error processing.
+ * This function must be called in a PG_CATCH() block.
+ */
+static void
+worker_post_error_processing(void)

The function comment and function name are too vague/generic. I guess
this is a hang-over from Sawada-san's proposed patch, but now since
this is only called when disabling the subscription so both the
comment and the function name should say that's what it is doing...

e.g. rename to DisableSubscriptionOnError() or something similar.

~~~

2. src/backend/replication/logical/worker.c - worker_post_error_processing

+ /* Notify the subscription has been disabled */
+ ereport(LOG,
+ errmsg("logical replication subscription \"%s\" has been be disabled
due to an error",
+MySubscription->name));

  proc_exit(0);
 }

I know this is common code, but IMO it would be better to do the
proc_exit(0); from the caller in the PG_CATCH. Then I think the code
will be much easier to read the throw/exit logic, rather than now
where it is just calling some function that never returns...

Alternatively, if you want the code how it is, then the function name
should give some hint that it is never going to return - e.g.
DisableSubscriptionOnErrorAndExit)

~~~

3. src/backend/replication/logical/worker.c - start_table_sync

+ {
+ /*
+ * Abort the current transaction so that we send the stats message
+ * in an idle state.
+ */
+ AbortOutOfAnyTransaction();
+
+ /* Report the worker failed during table synchronization */
+ pgstat_report_subscription_error(MySubscription->oid, false);
+
+ PG_RE_THROW();
+ }

(This is a repeat of a previous comment from [1] comment #2)

I felt the separation of those 2 statements and comments makes the
code less clean than it could/should be. IMO they should be grouped
together.

SUGGESTED

/*
* Report the worker failed during table synchronization. Abort the
* current transaction so that the stats message is sent in an idle
* state.
*/
AbortOutOfAnyTransaction();
pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());

~~~

4. src/backend/replication/logical/worker.c - start_apply

+ {
+ /*
+ * Abort the current transaction so that we send the stats message
+ * in an idle state.
+ */
+ AbortOutOfAnyTransaction();
+
+ /* Report the worker failed while applying changes */
+ pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());
+
+ PG_RE_THROW();
+ }

(same as #3 but comment says "while applying changes")

SUGGESTED

/*
* Report the worker failed while applying changing. Abort the current
* transaction so that the stats message is sent in an idle state.
*/
AbortOutOfAnyTransaction();
pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPucrizJpqhSyD7dKj1yRkNMskqmiekD_RRqYpdDdusMRQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




RE: Optionally automatically disable logical replication subscriptions on error

2022-03-07 Thread osumi.takami...@fujitsu.com
On Monday, March 7, 2022 5:45 PM Amit Kaila  wrote:
> On Mon, Mar 7, 2022 at 4:55 AM Peter Smith 
> wrote:
> >
> > On Fri, Mar 4, 2022 at 5:55 PM Masahiko Sawada
>  wrote:
> > >
> > > ---
> > > +/*
> > > + * First, ensure that we log the error message so
> > > that it won't be
> > > + * lost if some other internal error occurs in the
> > > following code.
> > > + * Then, abort the current transaction and send the
> > > stats message of
> > > + * the table synchronization failure in an idle state.
> > > + */
> > > +HOLD_INTERRUPTS();
> > > +EmitErrorReport();
> > > +FlushErrorState();
> > > +AbortOutOfAnyTransaction();
> > > +RESUME_INTERRUPTS();
> > > +
> > > + pgstat_report_subscription_error(MySubscription->oid, false);
> > > +
> > > +if (MySubscription->disableonerr)
> > > +{
> > > +DisableSubscriptionOnError();
> > > +proc_exit(0);
> > > +}
> > > +
> > > +PG_RE_THROW();
> > >
> > > If the disableonerr is false, the same error is reported twice.
> > > Also, the code in PG_CATCH() in both start_apply() and
> > > start_table_sync() are almost the same. Can we create a common
> > > function to do post-error processing?
> > >
> > > The worker should exit with return code 1.
> > >
> > > I've attached a fixup patch for changes to worker.c for your
> > > reference. Feel free to adopt the changes.
> >
> > The way that common function is implemented required removal of the
> > existing PG_RE_THROW logic, which in turn was only possible using
> > special knowledge that this just happens to be the last try/catch
> > block for the apply worker.
> >
> 
> I think we should re_throw the error in case we have not handled it by 
> disabling
> the subscription (in which case we can exit with success code (0)).
Agreed. Fixed the patch so that it use re_throw.

Another point I changed from v28 is the order
to call AbortOutOfAnyTransaction and FlushErrorState,
which now is more aligned with other places.

Kindly check the attached v29.

Best Regards,
Takamichi Osumi



v29-0001-Optionally-disable-subscriptions-on-error.patch
Description: v29-0001-Optionally-disable-subscriptions-on-error.patch


Re: Optionally automatically disable logical replication subscriptions on error

2022-03-07 Thread Amit Kapila
On Mon, Mar 7, 2022 at 4:55 AM Peter Smith  wrote:
>
> On Fri, Mar 4, 2022 at 5:55 PM Masahiko Sawada  wrote:
> >
> > ---
> > +/*
> > + * First, ensure that we log the error message so
> > that it won't be
> > + * lost if some other internal error occurs in the
> > following code.
> > + * Then, abort the current transaction and send the
> > stats message of
> > + * the table synchronization failure in an idle state.
> > + */
> > +HOLD_INTERRUPTS();
> > +EmitErrorReport();
> > +FlushErrorState();
> > +AbortOutOfAnyTransaction();
> > +RESUME_INTERRUPTS();
> > +pgstat_report_subscription_error(MySubscription->oid, 
> > false);
> > +
> > +if (MySubscription->disableonerr)
> > +{
> > +DisableSubscriptionOnError();
> > +proc_exit(0);
> > +}
> > +
> > +PG_RE_THROW();
> >
> > If the disableonerr is false, the same error is reported twice. Also,
> > the code in PG_CATCH() in both start_apply() and start_table_sync()
> > are almost the same. Can we create a common function to do post-error
> > processing?
> >
> > The worker should exit with return code 1.
> >
> > I've attached a fixup patch for changes to worker.c for your
> > reference. Feel free to adopt the changes.
>
> The way that common function is implemented required removal of the
> existing PG_RE_THROW logic, which in turn was only possible using
> special knowledge that this just happens to be the last try/catch
> block for the apply worker.
>

I think we should re_throw the error in case we have not handled it by
disabling the subscription (in which case we can exit with success
code (0)).

-- 
With Regards,
Amit Kapila.




RE: Optionally automatically disable logical replication subscriptions on error

2022-03-06 Thread osumi.takami...@fujitsu.com
On Monday, March 7, 2022 12:01 PM Shi, Yu/侍 雨  wrote:
> On Wed, Mar 2, 2022 5:39 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > Attached an updated patch v26.
> >
> 
> Thanks for your patch. A comment on the document.
Hi, thank you for checking my patch !


> @@ -7771,6 +7771,16 @@ SCRAM-SHA-256$iteration
> count:
> 
>   
>
> +   subdisableonerr bool
> +  
> +  
> +   If true, the subscription will be disabled if one of its workers
> +   detects an error
> +  
> + 
> +
> + 
> +  
> subconninfo text
>
>
> 
> The document for "subdisableonerr" option is placed after "The following
> parameters control what happens during subscription creation: ". I think it
> should be placed after "The following parameters control the subscription's
> replication behavior after it has been created: ", right?
Addressed your comment for create_subscription.sgml
(not for catalogs.sgml).

Attached an updated patch v28.


Best Regards,
Takamichi Osumi



v28-0001-Optionally-disable-subscriptions-on-error.patch
Description: v28-0001-Optionally-disable-subscriptions-on-error.patch


RE: Optionally automatically disable logical replication subscriptions on error

2022-03-06 Thread osumi.takami...@fujitsu.com
On Friday, March 4, 2022 3:55 PM Masahiko Sawada  wrote:
> Thank you for updating the patch.
> 
> Here are some comments on v26 patch:
Thank you for your review !



> +/*
> + * Disable the current subscription.
> + */
> +static void
> +DisableSubscriptionOnError(void)
> 
> This function now just updates the pg_subscription catalog so can we move it
> to pg_subscritpion.c while having this function accept the subscription OID to
> disable? If you agree, the function comment will also need to be updated.
Agreed. Fixed.


> ---
> +/*
> + * First, ensure that we log the error message so
> that it won't be
> + * lost if some other internal error occurs in the
> following code.
> + * Then, abort the current transaction and send the
> stats message of
> + * the table synchronization failure in an idle state.
> + */
> +HOLD_INTERRUPTS();
> +EmitErrorReport();
> +FlushErrorState();
> +AbortOutOfAnyTransaction();
> +RESUME_INTERRUPTS();
> +pgstat_report_subscription_error(MySubscription->oid,
> + false);
> +
> +if (MySubscription->disableonerr)
> +{
> +DisableSubscriptionOnError();
> +proc_exit(0);
> +}
> +
> +PG_RE_THROW();
> 
> If the disableonerr is false, the same error is reported twice. Also, the 
> code in
> PG_CATCH() in both start_apply() and start_table_sync() are almost the same.
> Can we create a common function to do post-error processing?
Yes. Also, calling PG_RE_THROW wasn't appropriate,
because in the previous v26, for the second error you mentioned,
the patch didn't call errstart when disable_on_error = false.
This was introduced by recent patch refactoring around this code and the rebase
of this patch, but has been fixed by your suggestion.


> The worker should exit with return code 1.
> I've attached a fixup patch for changes to worker.c for your reference. Feel 
> free
> to adopt the changes.
Yes. I adopted almost all of your suggestion.
One thing I fixed was a comment that mentioned table sync
in worker_post_error_processing(), because start_apply()
also uses the function.


> 
> ---
> +
> +# Confirm that we have finished the table sync.
> +is( $node_subscriber->safe_psql(
> +'postgres', qq(SELECT MAX(i), COUNT(*) FROM tbl)),
> +"1|3",
> +"subscription sub replicated data");
> +
> 
> Can we store the result to a local variable to check? I think it's more 
> consistent
> with other tap tests.
Agreed. Fixed.


Attached the v27. Kindly review the patch.


Best Regards,
Takamichi Osumi



v27-0001-Optionally-disable-subscriptions-on-error.patch
Description: v27-0001-Optionally-disable-subscriptions-on-error.patch


RE: Optionally automatically disable logical replication subscriptions on error

2022-03-06 Thread shiy.f...@fujitsu.com
On Wed, Mar 2, 2022 5:39 PM osumi.takami...@fujitsu.com 
 wrote:
> 
> Attached an updated patch v26.
> 

Thanks for your patch. A comment on the document.

@@ -7771,6 +7771,16 @@ SCRAM-SHA-256$iteration 
count:
 
  
   
+   subdisableonerr bool
+  
+  
+   If true, the subscription will be disabled if one of its workers
+   detects an error
+  
+ 
+
+ 
+  
subconninfo text
   
   

The document for "subdisableonerr" option is placed after "The following
parameters control what happens during subscription creation: ". I think it
should be placed after "The following parameters control the subscription's
replication behavior after it has been created: ", right?

Regards,
Shi yu


Re: Optionally automatically disable logical replication subscriptions on error

2022-03-06 Thread Peter Smith
On Fri, Mar 4, 2022 at 5:55 PM Masahiko Sawada  wrote:
>
> On Wed, Mar 2, 2022 at 6:38 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Wednesday, March 2, 2022 12:47 PM Masahiko Sawada 
> >  wrote:
> > > After more thoughts, should we do both AbortOutOfAnyTransaction() and 
> > > error
> > > message handling while holding interrupts? That is,
> > >
> > > HOLD_INTERRUPTS();
> > > EmitErrorReport();
> > > FlushErrorState();
> > > AbortOutOfAny Transaction();
> > > RESUME_INTERRUPTS();
> > > pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_work
> > > er());
> > >
> > > I think it's better that we do clean up first and then do other works 
> > > such as
> > > sending the message to the stats collector and updating the catalog.
> > I agree. Fixed. Along with this change, I corrected the header comment of
> > DisableSubscriptionOnError, too.
> >
> >
> > > Here are some comments on v24 patch:
> > >
> > > +/* Look up our subscription in the catalogs */
> > > +tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, MyDatabaseId,
> > > +
> > > CStringGetDatum(MySubscription->name));
> > >
> > > s/catalogs/catalog/
> > >
> > > Why don't we use SUBSCRIPTIONOID with MySubscription->oid?
> > Changed.
> >
> >
> > > ---
> > > +if (!HeapTupleIsValid(tup))
> > > +ereport(ERROR,
> > > +errcode(ERRCODE_UNDEFINED_OBJECT),
> > > +errmsg("subscription \"%s\" does not
> > > exist",
> > > +   MySubscription->name));
> > >
> > > I think we should use elog() here rather than ereport() since it's a
> > > should-not-happen error.
> > Fixed
> >
> >
> > > ---
> > > +/* Notify the subscription will be no longer valid */
> > >
> > > I'd suggest rephrasing it to like "Notify the subscription will be 
> > > disabled". (the
> > > subscription is still valid actually, but just disabled).
> > Fixed. Also, I've made this sentence past one, because of the code place
> > change below.
> >
> >
> > > ---
> > > +/* Notify the subscription will be no longer valid */
> > > +ereport(LOG,
> > > +errmsg("logical replication subscription
> > > \"%s\" will be disabled due to an error",
> > > +   MySubscription->name));
> > > +
> > >
> > > I think we can report the log at the end of this function rather than 
> > > during the
> > > transaction.
> > Fixed. In this case, I needed to adjust the comment to indicate the 
> > processing
> > to disable the sub has *completed* as well.
> >
> > > ---
> > > +my $cmd = qq(
> > > +CREATE TABLE tbl (i INT);
> > > +ALTER TABLE tbl REPLICA IDENTITY FULL;
> > > +CREATE INDEX tbl_idx ON tbl(i));
> > >
> > > I think we don't need to set REPLICA IDENTITY FULL to this table since 
> > > there is
> > > notupdate/delete.
> > >
> > > Do we need tbl_idx?
> > Removed both the replica identity and tbl_idx;
> >
> >
> > > ---
> > > +$cmd = qq(
> > > +SELECT COUNT(1) = 1 FROM pg_catalog.pg_subscription_rel sr WHERE
> > > +sr.srsubstate IN ('s', 'r'));
> > > +$node_subscriber->poll_query_until('postgres', $cmd);
> > >
> > > It seems better to add a condition of srrelid just in case.
> > Makes sense. Fixed.
> >
> >
> > > ---
> > > +$cmd = qq(
> > > +SELECT count(1) = 1 FROM pg_catalog.pg_subscription s WHERE
> > > s.subname =
> > > +'sub' AND s.subenabled IS FALSE);
> > > +$node_subscriber->poll_query_until('postgres', $cmd)
> > > +  or die "Timed out while waiting for subscriber to be disabled";
> > >
> > > I think that it's more natural to directly check the subscription's 
> > > subenabled.
> > > For example:
> > >
> > > SELECT subenabled = false FROM pg_subscription WHERE subname = 'sub';
> > Fixed. I modified another code similar to this for tablesync error as well.
> >
> >
> > > ---
> > > +$cmd = q(ALTER SUBSCRIPTION sub ENABLE);
> > > +$node_subscriber->safe_psql('postgres', $cmd); $cmd = q(SELECT
> > > COUNT(1)
> > > += 3 FROM tbl WHERE i = 3);
> > > +$node_subscriber->poll_query_until('postgres', $cmd)
> > > +  or die "Timed out while waiting for applying";
> > >
> > > I think it's better to wait for the subscriber to catch up and check the 
> > > query
> > > result instead of using poll_query_until() so that we can check the query 
> > > result
> > > in case where the test fails.
> > I modified the code to wait for the subscriber and deleted poll_query_until.
> > Also, when I consider the test failure for this test as you mentioned,
> > expecting and checking the number of return value that equals 3
> > would be better. So, I expressed this point in my test as well,
> > according to your advice.
> >
> >
> > > ---
> > > +$cmd = qq(DROP INDEX tbl_unique);
> > > +$node_subscriber->safe_psql('postgres', $cmd);
> > >
> > > In the newly added tap tests, all queries are consistently assigned to 
> > > $cmd and
> > > executed even when the query is used only once. It seems a different 
> > > style from
> 

Re: Optionally automatically disable logical replication subscriptions on error

2022-03-03 Thread Masahiko Sawada
On Wed, Mar 2, 2022 at 6:38 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Wednesday, March 2, 2022 12:47 PM Masahiko Sawada  
> wrote:
> > After more thoughts, should we do both AbortOutOfAnyTransaction() and error
> > message handling while holding interrupts? That is,
> >
> > HOLD_INTERRUPTS();
> > EmitErrorReport();
> > FlushErrorState();
> > AbortOutOfAny Transaction();
> > RESUME_INTERRUPTS();
> > pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_work
> > er());
> >
> > I think it's better that we do clean up first and then do other works such 
> > as
> > sending the message to the stats collector and updating the catalog.
> I agree. Fixed. Along with this change, I corrected the header comment of
> DisableSubscriptionOnError, too.
>
>
> > Here are some comments on v24 patch:
> >
> > +/* Look up our subscription in the catalogs */
> > +tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, MyDatabaseId,
> > +
> > CStringGetDatum(MySubscription->name));
> >
> > s/catalogs/catalog/
> >
> > Why don't we use SUBSCRIPTIONOID with MySubscription->oid?
> Changed.
>
>
> > ---
> > +if (!HeapTupleIsValid(tup))
> > +ereport(ERROR,
> > +errcode(ERRCODE_UNDEFINED_OBJECT),
> > +errmsg("subscription \"%s\" does not
> > exist",
> > +   MySubscription->name));
> >
> > I think we should use elog() here rather than ereport() since it's a
> > should-not-happen error.
> Fixed
>
>
> > ---
> > +/* Notify the subscription will be no longer valid */
> >
> > I'd suggest rephrasing it to like "Notify the subscription will be 
> > disabled". (the
> > subscription is still valid actually, but just disabled).
> Fixed. Also, I've made this sentence past one, because of the code place
> change below.
>
>
> > ---
> > +/* Notify the subscription will be no longer valid */
> > +ereport(LOG,
> > +errmsg("logical replication subscription
> > \"%s\" will be disabled due to an error",
> > +   MySubscription->name));
> > +
> >
> > I think we can report the log at the end of this function rather than 
> > during the
> > transaction.
> Fixed. In this case, I needed to adjust the comment to indicate the processing
> to disable the sub has *completed* as well.
>
> > ---
> > +my $cmd = qq(
> > +CREATE TABLE tbl (i INT);
> > +ALTER TABLE tbl REPLICA IDENTITY FULL;
> > +CREATE INDEX tbl_idx ON tbl(i));
> >
> > I think we don't need to set REPLICA IDENTITY FULL to this table since 
> > there is
> > notupdate/delete.
> >
> > Do we need tbl_idx?
> Removed both the replica identity and tbl_idx;
>
>
> > ---
> > +$cmd = qq(
> > +SELECT COUNT(1) = 1 FROM pg_catalog.pg_subscription_rel sr WHERE
> > +sr.srsubstate IN ('s', 'r'));
> > +$node_subscriber->poll_query_until('postgres', $cmd);
> >
> > It seems better to add a condition of srrelid just in case.
> Makes sense. Fixed.
>
>
> > ---
> > +$cmd = qq(
> > +SELECT count(1) = 1 FROM pg_catalog.pg_subscription s WHERE
> > s.subname =
> > +'sub' AND s.subenabled IS FALSE);
> > +$node_subscriber->poll_query_until('postgres', $cmd)
> > +  or die "Timed out while waiting for subscriber to be disabled";
> >
> > I think that it's more natural to directly check the subscription's 
> > subenabled.
> > For example:
> >
> > SELECT subenabled = false FROM pg_subscription WHERE subname = 'sub';
> Fixed. I modified another code similar to this for tablesync error as well.
>
>
> > ---
> > +$cmd = q(ALTER SUBSCRIPTION sub ENABLE);
> > +$node_subscriber->safe_psql('postgres', $cmd); $cmd = q(SELECT
> > COUNT(1)
> > += 3 FROM tbl WHERE i = 3);
> > +$node_subscriber->poll_query_until('postgres', $cmd)
> > +  or die "Timed out while waiting for applying";
> >
> > I think it's better to wait for the subscriber to catch up and check the 
> > query
> > result instead of using poll_query_until() so that we can check the query 
> > result
> > in case where the test fails.
> I modified the code to wait for the subscriber and deleted poll_query_until.
> Also, when I consider the test failure for this test as you mentioned,
> expecting and checking the number of return value that equals 3
> would be better. So, I expressed this point in my test as well,
> according to your advice.
>
>
> > ---
> > +$cmd = qq(DROP INDEX tbl_unique);
> > +$node_subscriber->safe_psql('postgres', $cmd);
> >
> > In the newly added tap tests, all queries are consistently assigned to $cmd 
> > and
> > executed even when the query is used only once. It seems a different style 
> > from
> > the one in other tap tests. Is there any reason why we do this style for 
> > all queries
> > in the tap tests?
> Fixed. I removed the 'cmd' variable itself.
>
>
> Attached an updated patch v26.

Thank you for updating the patch.

Here are some comments on v26 patch:

+/*
+ * Disable the current subscription.
+ */
+static void

RE: Optionally automatically disable logical replication subscriptions on error

2022-03-02 Thread osumi.takami...@fujitsu.com
On Wednesday, March 2, 2022 12:47 PM Masahiko Sawada  
wrote:
> After more thoughts, should we do both AbortOutOfAnyTransaction() and error
> message handling while holding interrupts? That is,
> 
> HOLD_INTERRUPTS();
> EmitErrorReport();
> FlushErrorState();
> AbortOutOfAny Transaction();
> RESUME_INTERRUPTS();
> pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_work
> er());
> 
> I think it's better that we do clean up first and then do other works such as
> sending the message to the stats collector and updating the catalog.
I agree. Fixed. Along with this change, I corrected the header comment of
DisableSubscriptionOnError, too.


> Here are some comments on v24 patch:
> 
> +/* Look up our subscription in the catalogs */
> +tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, MyDatabaseId,
> +
> CStringGetDatum(MySubscription->name));
> 
> s/catalogs/catalog/
> 
> Why don't we use SUBSCRIPTIONOID with MySubscription->oid?
Changed.


> ---
> +if (!HeapTupleIsValid(tup))
> +ereport(ERROR,
> +errcode(ERRCODE_UNDEFINED_OBJECT),
> +errmsg("subscription \"%s\" does not
> exist",
> +   MySubscription->name));
> 
> I think we should use elog() here rather than ereport() since it's a
> should-not-happen error.
Fixed


> ---
> +/* Notify the subscription will be no longer valid */
> 
> I'd suggest rephrasing it to like "Notify the subscription will be disabled". 
> (the
> subscription is still valid actually, but just disabled).
Fixed. Also, I've made this sentence past one, because of the code place
change below.

 
> ---
> +/* Notify the subscription will be no longer valid */
> +ereport(LOG,
> +errmsg("logical replication subscription
> \"%s\" will be disabled due to an error",
> +   MySubscription->name));
> +
> 
> I think we can report the log at the end of this function rather than during 
> the
> transaction.
Fixed. In this case, I needed to adjust the comment to indicate the processing
to disable the sub has *completed* as well.

> ---
> +my $cmd = qq(
> +CREATE TABLE tbl (i INT);
> +ALTER TABLE tbl REPLICA IDENTITY FULL;
> +CREATE INDEX tbl_idx ON tbl(i));
> 
> I think we don't need to set REPLICA IDENTITY FULL to this table since there 
> is
> notupdate/delete.
> 
> Do we need tbl_idx?
Removed both the replica identity and tbl_idx;


> ---
> +$cmd = qq(
> +SELECT COUNT(1) = 1 FROM pg_catalog.pg_subscription_rel sr WHERE
> +sr.srsubstate IN ('s', 'r'));
> +$node_subscriber->poll_query_until('postgres', $cmd);
> 
> It seems better to add a condition of srrelid just in case.
Makes sense. Fixed.


> ---
> +$cmd = qq(
> +SELECT count(1) = 1 FROM pg_catalog.pg_subscription s WHERE
> s.subname =
> +'sub' AND s.subenabled IS FALSE);
> +$node_subscriber->poll_query_until('postgres', $cmd)
> +  or die "Timed out while waiting for subscriber to be disabled";
> 
> I think that it's more natural to directly check the subscription's 
> subenabled.
> For example:
> 
> SELECT subenabled = false FROM pg_subscription WHERE subname = 'sub';
Fixed. I modified another code similar to this for tablesync error as well.


> ---
> +$cmd = q(ALTER SUBSCRIPTION sub ENABLE);
> +$node_subscriber->safe_psql('postgres', $cmd); $cmd = q(SELECT
> COUNT(1)
> += 3 FROM tbl WHERE i = 3);
> +$node_subscriber->poll_query_until('postgres', $cmd)
> +  or die "Timed out while waiting for applying";
> 
> I think it's better to wait for the subscriber to catch up and check the query
> result instead of using poll_query_until() so that we can check the query 
> result
> in case where the test fails.
I modified the code to wait for the subscriber and deleted poll_query_until.
Also, when I consider the test failure for this test as you mentioned,
expecting and checking the number of return value that equals 3
would be better. So, I expressed this point in my test as well,
according to your advice.


> ---
> +$cmd = qq(DROP INDEX tbl_unique);
> +$node_subscriber->safe_psql('postgres', $cmd);
> 
> In the newly added tap tests, all queries are consistently assigned to $cmd 
> and
> executed even when the query is used only once. It seems a different style 
> from
> the one in other tap tests. Is there any reason why we do this style for all 
> queries
> in the tap tests?
Fixed. I removed the 'cmd' variable itself.


Attached an updated patch v26.

Best Regards,
Takamichi Osumi



v26-0001-Optionally-disable-subscriptions-on-error.patch
Description: v26-0001-Optionally-disable-subscriptions-on-error.patch


Re: Optionally automatically disable logical replication subscriptions on error

2022-03-01 Thread Masahiko Sawada
On Wed, Mar 2, 2022 at 9:34 AM Peter Smith  wrote:
>
> Please see below my review comments for v24.
>
> ==
>
> 1. src/backend/replication/logical/worker.c - start_table_sync
>
> + /* Report the worker failed during table synchronization */
> + pgstat_report_subscription_error(MySubscription->oid, 
> !am_tablesync_worker());
>
> (This review comment is just FYI in case you did not do this deliberately)
>
> FYI, you didn't really need to call am_tablesync_worker() here because
> it is already asserted for the sync phase that it MUST be a tablesync
> worker.
>
> OTOH, IMO it documents the purpose of the parm so if it was deliberate
> then that is OK too.
>
> ~~~
>
> 2. src/backend/replication/logical/worker.c - start_table_sync
>
> + PG_CATCH();
> + {
> + /*
> + * Abort the current transaction so that we send the stats message in
> + * an idle state.
> + */
> + AbortOutOfAnyTransaction();
> +
> + /* Report the worker failed during table synchronization */
> + pgstat_report_subscription_error(MySubscription->oid, 
> !am_tablesync_worker());
> +
>
> [Maybe you will say that this review comment is unrelated to
> disable_on_err, but since this is a totally new/refactored function
> then I think maybe there is no problem to make this change at the same
> time. Anyway there is no function change, it is just rearranging some
> comments.]
>
> I felt the separation of those 2 statements and comments makes that
> code less clean than it could/should be. IMO they should be grouped
> together.
>
> SUGGESTED
> /*
> * Report the worker failed during table synchronization. Abort the
> * current transaction so that the stats message is sent in an idle
> * state.
> */
> AbortOutOfAnyTransaction();
> pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());

After more thoughts, should we do both AbortOutOfAnyTransaction() and
error message handling while holding interrupts? That is,

HOLD_INTERRUPTS();
EmitErrorReport();
FlushErrorState();
AbortOutOfAny Transaction();
RESUME_INTERRUPTS();
pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());

I think it's better that we do clean up first and then do other works
such as sending the message to the stats collector and updating the
catalog.

Here are some comments on v24 patch:

+/* Look up our subscription in the catalogs */
+tup = SearchSysCacheCopy2(SUBSCRIPTIONNAME, MyDatabaseId,
+
CStringGetDatum(MySubscription->name));

s/catalogs/catalog/

Why don't we use SUBSCRIPTIONOID with MySubscription->oid?

---
+if (!HeapTupleIsValid(tup))
+ereport(ERROR,
+errcode(ERRCODE_UNDEFINED_OBJECT),
+errmsg("subscription \"%s\" does not exist",
+   MySubscription->name));

I think we should use elog() here rather than ereport() since it's a
should-not-happen error.

---
+/* Notify the subscription will be no longer valid */

I'd suggest rephrasing it to like "Notify the subscription will be
disabled". (the subscription is still valid actually, but just
disabled).

---
+/* Notify the subscription will be no longer valid */
+ereport(LOG,
+errmsg("logical replication subscription
\"%s\" will be disabled due to an error",
+   MySubscription->name));
+

I think we can report the log at the end of this function rather than
during the transaction.

---
+my $cmd = qq(
+CREATE TABLE tbl (i INT);
+ALTER TABLE tbl REPLICA IDENTITY FULL;
+CREATE INDEX tbl_idx ON tbl(i));

I think we don't need to set REPLICA IDENTITY FULL to this table since
there is notupdate/delete.

Do we need tbl_idx?

---
+$cmd = qq(
+SELECT COUNT(1) = 1 FROM pg_catalog.pg_subscription_rel sr
+WHERE sr.srsubstate IN ('s', 'r'));
+$node_subscriber->poll_query_until('postgres', $cmd);

It seems better to add a condition of srrelid just in case.

---
+$cmd = qq(
+SELECT count(1) = 1 FROM pg_catalog.pg_subscription s
+WHERE s.subname = 'sub' AND s.subenabled IS FALSE);
+$node_subscriber->poll_query_until('postgres', $cmd)
+  or die "Timed out while waiting for subscriber to be disabled";

I think that it's more natural to directly check the subscription's
subenabled. For example:

SELECT subenabled = false FROM pg_subscription WHERE subname = 'sub';

---
+$cmd = q(ALTER SUBSCRIPTION sub ENABLE);
+$node_subscriber->safe_psql('postgres', $cmd);
+$cmd = q(SELECT COUNT(1) = 3 FROM tbl WHERE i = 3);
+$node_subscriber->poll_query_until('postgres', $cmd)
+  or die "Timed out while waiting for applying";

I think it's better to wait for the subscriber to catch up and check
the query result instead of using poll_query_until() so that we can
check the query result in case where the test fails.

---
+$cmd = qq(DROP INDEX tbl_unique);
+$node_subscriber->safe_psql('postgres', $cmd);

In the newly added tap tests, all queries are consistently assigned to
$cmd and executed even 

RE: Optionally automatically disable logical replication subscriptions on error

2022-03-01 Thread osumi.takami...@fujitsu.com
On Wednesday, March 2, 2022 9:34 AM Peter Smith  wrote:
> Please see below my review comments for v24.
Thank you for checking my patch !

 
> ==
> 
> 1. src/backend/replication/logical/worker.c - start_table_sync
> 
> + /* Report the worker failed during table synchronization */
> + pgstat_report_subscription_error(MySubscription->oid,
> + !am_tablesync_worker());
> 
> (This review comment is just FYI in case you did not do this deliberately)
> 
> FYI, you didn't really need to call am_tablesync_worker() here because it is
> already asserted for the sync phase that it MUST be a tablesync worker.
> 
> OTOH, IMO it documents the purpose of the parm so if it was deliberate then
> that is OK too.
Fixed.


> ~~~
> 
> 2. src/backend/replication/logical/worker.c - start_table_sync
> 
> + PG_CATCH();
> + {
> + /*
> + * Abort the current transaction so that we send the stats message in
> + * an idle state.
> + */
> + AbortOutOfAnyTransaction();
> +
> + /* Report the worker failed during table synchronization */
> + pgstat_report_subscription_error(MySubscription->oid,
> + !am_tablesync_worker());
> +
> 
> [Maybe you will say that this review comment is unrelated to disable_on_err,
> but since this is a totally new/refactored function then I think maybe there 
> is no
> problem to make this change at the same time. Anyway there is no function
> change, it is just rearranging some comments.]
> 
> I felt the separation of those 2 statements and comments makes that code less
> clean than it could/should be. IMO they should be grouped together.
> 
> SUGGESTED
> /*
> * Report the worker failed during table synchronization. Abort the
> * current transaction so that the stats message is sent in an idle
> * state.
> */
> AbortOutOfAnyTransaction();
> pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_work
> er());
I think this is OK. Thank you for suggestion. Fixed.



> ~~~
> 
> 3. src/backend/replication/logical/worker.c - start_apply
> 
> + /*
> + * Abort the current transaction so that we send the stats message in
> + * an idle state.
> + */
> + AbortOutOfAnyTransaction();
> +
> + /* Report the worker failed during the application of the change */
> + pgstat_report_subscription_error(MySubscription->oid,
> + !am_tablesync_worker());
> 
> Same comment as #2 above, but this code fragment is in start_apply function.
Fixed.


> ~~~
> 
> 4. src/test/subscription/t/029_disable_on_error.pl - comment
> 
> +# Drop the unique index on the sub and re-enabled the subscription.
> +# Then, confirm that we have finished the apply.
> 
> SUGGESTED (tweak the comment wording)
> # Drop the unique index on the sub and re-enable the subscription.
> # Then, confirm that the previously failing insert was applied OK.
Fixed.


Best Regards,
Takamichi Osumi



v25-0001-Optionally-disable-subscriptions-on-error.patch
Description: v25-0001-Optionally-disable-subscriptions-on-error.patch


Re: Optionally automatically disable logical replication subscriptions on error

2022-03-01 Thread Peter Smith
Please see below my review comments for v24.

==

1. src/backend/replication/logical/worker.c - start_table_sync

+ /* Report the worker failed during table synchronization */
+ pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());

(This review comment is just FYI in case you did not do this deliberately)

FYI, you didn't really need to call am_tablesync_worker() here because
it is already asserted for the sync phase that it MUST be a tablesync
worker.

OTOH, IMO it documents the purpose of the parm so if it was deliberate
then that is OK too.

~~~

2. src/backend/replication/logical/worker.c - start_table_sync

+ PG_CATCH();
+ {
+ /*
+ * Abort the current transaction so that we send the stats message in
+ * an idle state.
+ */
+ AbortOutOfAnyTransaction();
+
+ /* Report the worker failed during table synchronization */
+ pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());
+

[Maybe you will say that this review comment is unrelated to
disable_on_err, but since this is a totally new/refactored function
then I think maybe there is no problem to make this change at the same
time. Anyway there is no function change, it is just rearranging some
comments.]

I felt the separation of those 2 statements and comments makes that
code less clean than it could/should be. IMO they should be grouped
together.

SUGGESTED
/*
* Report the worker failed during table synchronization. Abort the
* current transaction so that the stats message is sent in an idle
* state.
*/
AbortOutOfAnyTransaction();
pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());

~~~

3. src/backend/replication/logical/worker.c - start_apply

+ /*
+ * Abort the current transaction so that we send the stats message in
+ * an idle state.
+ */
+ AbortOutOfAnyTransaction();
+
+ /* Report the worker failed during the application of the change */
+ pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());

Same comment as #2 above, but this code fragment is in start_apply function.

~~~

4. src/test/subscription/t/029_disable_on_error.pl - comment

+# Drop the unique index on the sub and re-enabled the subscription.
+# Then, confirm that we have finished the apply.

SUGGESTED (tweak the comment wording)
# Drop the unique index on the sub and re-enable the subscription.
# Then, confirm that the previously failing insert was applied OK.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




RE: Optionally automatically disable logical replication subscriptions on error

2022-02-28 Thread osumi.takami...@fujitsu.com
On Tuesday, March 1, 2022 9:49 AM Peter Smith  wrote:
> Please see below my review comments for v22.
> 
> ==
> 
> 1. Commit message
> 
> "table sync worker" -> "tablesync worker"
Fixed.

> ~~~
> 
> 2. doc/src/sgml/catalogs.sgml
> 
> +  
> +   If true, the subscription will be disabled when subscription
> +   workers detect any errors
> +  
> 
> It felt a bit strange to say "subscription" 2x in the sentence, but I am not 
> sure
> how to improve it. Maybe like below?
> 
> BEFORE
> If true, the subscription will be disabled when subscription workers detect 
> any
> errors
> 
> SUGGESTED
> If true, the subscription will be disabled if one of its workers detects an 
> error
Fixed.


> ~~~
> 
> 3. src/backend/replication/logical/worker.c - DisableSubscriptionOnError
> 
> @@ -2802,6 +2803,69 @@ LogicalRepApplyLoop(XLogRecPtr
> last_received)  }
> 
>  /*
> + * Disable the current subscription, after error recovery processing.
> + */
> +static void
> +DisableSubscriptionOnError(void)
> 
> I thought the "after error recovery processing" part was a bit generic and 
> did not
> really say what it was doing.
> 
> BEFORE
> Disable the current subscription, after error recovery processing.
> SUGGESTED
> Disable the current subscription, after logging the error that caused this
> function to be called.
Fixed.

> ~~~
> 
> 4. src/backend/replication/logical/worker.c - start_apply
> 
> + if (MySubscription->disableonerr)
> + {
> + DisableSubscriptionOnError();
> + return;
> + }
> +
> + MemoryContextSwitchTo(ecxt);
> + PG_RE_THROW();
> + }
> + PG_END_TRY();
> 
> The current code looks correct, but I felt it is a bit tricky to easily see 
> the
> execution path after the return.
> 
> Since it will effectively just exit anyhow I think it will be simpler just to 
> do that
> explicitly right here instead of the 'return'. This will also make the code
> consistent with the same 'disableonerr' logic of the start_start_sync.
> 
> SUGGESTION
> if (MySubscription->disableonerr)
> {
> DisableSubscriptionOnError();
> proc_exit(0);
> }
Fixed.

> ~~~
> 
> 5. src/bin/pg_dump/pg_dump.c
> 
> @@ -4463,6 +4473,9 @@ dumpSubscription(Archive *fout, const
> SubscriptionInfo *subinfo)
>   if (strcmp(subinfo->subtwophasestate, two_phase_disabled) != 0)
>   appendPQExpBufferStr(query, ", two_phase = on");
> 
> + if (strcmp(subinfo->subdisableonerr, "f") != 0)
> + appendPQExpBufferStr(query, ", disable_on_error = true");
> +
> 
> Although the code is correct, I think it would be more natural to set this 
> option
> as true when the user wants it true. e.g. check for "t"
> same as 'subbinary' is doing. This way, even if there was some
> unknown/corrupted value the code would do nothing, which is the behaviour
> you want...
> 
> SUGGESTION
> if (strcmp(subinfo->subdisableonerr, "t") == 0)
Fixed.


> ~~~
> 
> 6. src/include/catalog/pg_subscription.h
> 
> @@ -67,6 +67,9 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> BKI_SHARED_RELATION BKI_ROW
> 
>   char subtwophasestate; /* Stream two-phase transactions */
> 
> + bool subdisableonerr; /* True if occurrence of apply errors
> + * should disable the subscription */
> 
> The comment seems not quite right because it's not just about apply errors. 
> E.g.
> I think any error in tablesync will cause disablement too.
> 
> BEFORE
> True if occurrence of apply errors should disable the subscription SUGGESTED
> True if a worker error should cause the subscription to be disabled
Fixed.


> ~~~
> 
> 7. src/test/regress/sql/subscription.sql - whitespace
> 
> +-- now it works
> +CREATE SUBSCRIPTION regress_testsub CONNECTION
> 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false,
> disable_on_error = false);
> +
> +\dRs+
> +
> +ALTER SUBSCRIPTION regress_testsub SET (disable_on_error = true);
> +
> +\dRs+
> +ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE); DROP
> +SUBSCRIPTION regress_testsub;
> +
> 
> I think should be a blank line after that last \dRs+ just like the other one,
> because it belongs logically with the code above it, not with the ALTER
> slot_name.
Fixed.


> ~~~
> 
> 8. src/test/subscription/t/028_disable_on_error.pl - filename
> 
> The 028 number needs to be bumped because there is already a TAP test
> called 028 now
This is already done in v22, so I've skipped this.

> ~~~
> 
> 9. src/test/subscription/t/028_disable_on_error.pl - missing test
> 
> There was no test case for the last combination where the user correct the
> apply worker problem: E.g. After a previous error/disable of the subscriber,
> remove the index, publish the inserts again, and check they get applied
> properly.
Fixed.

Attached the updated version v24.


Best Regards,
Takamichi Osumi



v24-0001-Optionally-disable-subscriptions-on-error.patch
Description: v24-0001-Optionally-disable-subscriptions-on-error.patch


RE: Optionally automatically disable logical replication subscriptions on error

2022-02-28 Thread osumi.takami...@fujitsu.com
On Friday, February 25, 2022 9:45 PM osumi.takami...@fujitsu.com 
 wrote:
> Kindly have a look at attached the v22.
> It has incorporated other improvements of TAP test, refinement of the
> DisableSubscriptionOnError function and so on.
The recent commit(7a85073) has changed the subscription workers
error handling. So, I rebased my disable_on_error patch first
for anyone who are interested in the review.

I'll incorporate incoming comments for v22 in my next version.

Best Regards,
Takamichi Osumi



v23-0001-Optionally-disable-subscriptions-on-error.patch
Description: v23-0001-Optionally-disable-subscriptions-on-error.patch


Re: Optionally automatically disable logical replication subscriptions on error

2022-02-28 Thread Peter Smith
Please see below my review comments for v22.

==

1. Commit message

"table sync worker" -> "tablesync worker"

~~~

2. doc/src/sgml/catalogs.sgml

+  
+   If true, the subscription will be disabled when subscription
+   workers detect any errors
+  

It felt a bit strange to say "subscription" 2x in the sentence, but I
am not sure how to improve it. Maybe like below?

BEFORE
If true, the subscription will be disabled when subscription workers
detect any errors

SUGGESTED
If true, the subscription will be disabled if one of its workers
detects an error

~~~

3. src/backend/replication/logical/worker.c - DisableSubscriptionOnError

@@ -2802,6 +2803,69 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 }

 /*
+ * Disable the current subscription, after error recovery processing.
+ */
+static void
+DisableSubscriptionOnError(void)

I thought the "after error recovery processing" part was a bit generic
and did not really say what it was doing.

BEFORE
Disable the current subscription, after error recovery processing.
SUGGESTED
Disable the current subscription, after logging the error that caused
this function to be called.

~~~

4. src/backend/replication/logical/worker.c - start_apply

+ if (MySubscription->disableonerr)
+ {
+ DisableSubscriptionOnError();
+ return;
+ }
+
+ MemoryContextSwitchTo(ecxt);
+ PG_RE_THROW();
+ }
+ PG_END_TRY();

The current code looks correct, but I felt it is a bit tricky to
easily see the execution path after the return.

Since it will effectively just exit anyhow I think it will be simpler
just to do that explicitly right here instead of the 'return'. This
will also make the code consistent with the same 'disableonerr' logic
of the start_start_sync.

SUGGESTION
if (MySubscription->disableonerr)
{
DisableSubscriptionOnError();
proc_exit(0);
}

~~~

5. src/bin/pg_dump/pg_dump.c

@@ -4463,6 +4473,9 @@ dumpSubscription(Archive *fout, const
SubscriptionInfo *subinfo)
  if (strcmp(subinfo->subtwophasestate, two_phase_disabled) != 0)
  appendPQExpBufferStr(query, ", two_phase = on");

+ if (strcmp(subinfo->subdisableonerr, "f") != 0)
+ appendPQExpBufferStr(query, ", disable_on_error = true");
+

Although the code is correct, I think it would be more natural to set
this option as true when the user wants it true. e.g. check for "t"
same as 'subbinary' is doing. This way, even if there was some
unknown/corrupted value the code would do nothing, which is the
behaviour you want...

SUGGESTION
if (strcmp(subinfo->subdisableonerr, "t") == 0)

~~~

6. src/include/catalog/pg_subscription.h

@@ -67,6 +67,9 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
BKI_SHARED_RELATION BKI_ROW

  char subtwophasestate; /* Stream two-phase transactions */

+ bool subdisableonerr; /* True if occurrence of apply errors
+ * should disable the subscription */

The comment seems not quite right because it's not just about apply
errors. E.g. I think any error in tablesync will cause disablement
too.

BEFORE
True if occurrence of apply errors should disable the subscription
SUGGESTED
True if a worker error should cause the subscription to be disabled

~~~

7. src/test/regress/sql/subscription.sql - whitespace

+-- now it works
+CREATE SUBSCRIPTION regress_testsub CONNECTION
'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
false, disable_on_error = false);
+
+\dRs+
+
+ALTER SUBSCRIPTION regress_testsub SET (disable_on_error = true);
+
+\dRs+
+ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
+DROP SUBSCRIPTION regress_testsub;
+

I think should be a blank line after that last \dRs+ just like the
other one, because it belongs logically with the code above it, not
with the ALTER slot_name.

~~~

8. src/test/subscription/t/028_disable_on_error.pl - filename

The 028 number needs to be bumped because there is already a TAP test
called 028 now

~~~

9. src/test/subscription/t/028_disable_on_error.pl - missing test

There was no test case for the last combination where the user correct
the apply worker problem: E.g. After a previous error/disable of the
subscriber, remove the index, publish the inserts again, and check
they get applied properly.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




RE: Optionally automatically disable logical replication subscriptions on error

2022-02-25 Thread osumi.takami...@fujitsu.com
On Tuesday, February 22, 2022 3:03 PM Peter Smith  wrote:
> Here are a couple more review comments for v21.
> 
> ~~~
> 
> 1. worker.c - comment
> 
> + subform = (Form_pg_subscription) GETSTRUCT(tup);
> +
> + /*
> + * We would not be here unless this subscription's disableonerr field
> + was
> + * true, but check whether that field has changed in the interim.
> + */
> + if (!subform->subdisableonerr)
> + {
> + heap_freetuple(tup);
> + table_close(rel, RowExclusiveLock);
> + CommitTransactionCommand();
> + return false;
> + }
> 
> I felt that comment belongs above the subform assignment because that is the
> only reason we are getting the subform again.
This part has been removed along with the modification
that we just disable the subscription in the main processing
when we get an error.

 
> ~~
> 
> 2. worker.c - subform->oid
> 
> + /* Notify the subscription will be no longer valid */ ereport(LOG,
> + errmsg("logical replication subscription \"%s\" will be disabled due
> to an error",
> +MySubscription->name));
> +
> + LockSharedObject(SubscriptionRelationId, subform->oid, 0,
> AccessExclusiveLock);
> 
> Can't we just use MySubscription->oid here? We really only needed that
> subform to get new option values.
Fixed.


> ~~
> 
> 3. worker.c - whitespace
> 
> Your pg_indent has also changed some whitespace for parts of worker.c that
> are completely unrelated to this patch. You might want to revert those 
> changes.
Fixed.

Kindly have a look at v22 that took in all your comments.
It's shared in [1].

[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373D9B26F988307B0D3FE20ED3E9%40TYCPR01MB8373.jpnprd01.prod.outlook.com


Best Regards,
Takamichi Osumi



RE: Optionally automatically disable logical replication subscriptions on error

2022-02-25 Thread osumi.takami...@fujitsu.com
On Wednesday, February 23, 2022 6:52 PM Tang, Haiying/唐 海英 
 wrote:
> I have a comment on v21 patch.
> 
> I wonder if we really need subscription s2 in 028_disable_on_error.pl. I 
> think for
> subscription s2, we only tested some normal cases(which could be tested with
> s1), and didn't test any error case, which means it wouldn't be automatically
> disabled.
> Is there any reason for creating subscription s2?
Removed the subscription s2.

This has reduced the code amount of TAP tests.
Kindly have a look at the v22 shared in [1].

[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373D9B26F988307B0D3FE20ED3E9%40TYCPR01MB8373.jpnprd01.prod.outlook.com

Best Regards,
Takamichi Osumi



RE: Optionally automatically disable logical replication subscriptions on error

2022-02-25 Thread osumi.takami...@fujitsu.com
On Thursday, February 24, 2022 4:50 PM Masahiko Sawada  
wrote:
> On Tue, Feb 22, 2022 at 3:03 PM Peter Smith 
> wrote:
> >
> > ~~~
> >
> > 1. worker.c - comment
> >
> > + subform = (Form_pg_subscription) GETSTRUCT(tup);
> > +
> > + /*
> > + * We would not be here unless this subscription's disableonerr field
> > + was
> > + * true, but check whether that field has changed in the interim.
> > + */
> > + if (!subform->subdisableonerr)
> > + {
> > + heap_freetuple(tup);
> > + table_close(rel, RowExclusiveLock);
> > + CommitTransactionCommand();
> > + return false;
> > + }
> >
> > I felt that comment belongs above the subform assignment because that
> > is the only reason we are getting the subform again.
> 
> IIUC if we return false here, the same error will be emitted twice.
> And I'm not sure this check is really necessary. It would work only when the
> subdisableonerr is set to false concurrently, but doesn't work for the 
> opposite
> caces. I think we can check
> MySubscription->disableonerr and then just update the tuple.
Addressed. I followed your advice and deleted the check.


Kindly have a look at v22 shared in [1].


[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373D9B26F988307B0D3FE20ED3E9%40TYCPR01MB8373.jpnprd01.prod.outlook.com


Best Regards,
Takamichi Osumi



RE: Optionally automatically disable logical replication subscriptions on error

2022-02-25 Thread osumi.takami...@fujitsu.com
On Thursday, February 24, 2022 8:09 PM Amit Kapila 
> On Thu, Feb 24, 2022 at 1:20 PM Masahiko Sawada
>  wrote:
> > +   /*
> > +* Log the error that caused DisableSubscriptionOnError to be
> called. We
> > +* do this immediately so that it won't be lost if some other 
> > internal
> > +* error occurs in the following code.
> > +*/
> > +   EmitErrorReport();
> > +   AbortOutOfAnyTransaction();
> > +   FlushErrorState();
> >
> > Do we need to hold interrupts during cleanup here?
> >
> 
> I think so. We do prevent interrupts via
> HOLD_INTERRUPTS/RESUME_INTERRUPTS during cleanup.
Fixed.

Kindly have a look at v22 shared in [1].

[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373D9B26F988307B0D3FE20ED3E9%40TYCPR01MB8373.jpnprd01.prod.outlook.com


Best Regards,
Takamichi Osumi



RE: Optionally automatically disable logical replication subscriptions on error

2022-02-25 Thread osumi.takami...@fujitsu.com
On Friday, February 25, 2022 12:57 PM Amit Kapila  
wrote:
> On Thu, Feb 24, 2022 at 6:30 PM Masahiko Sawada
>  wrote:
> >
> > On Thu, Feb 24, 2022 at 8:08 PM Amit Kapila 
> wrote:
> > >
> > > On Thu, Feb 24, 2022 at 1:20 PM Masahiko Sawada
>  wrote:
> > > >
> > > > Here are some comments:
> > > >
> > > > Why do we need SyncTableStartWrapper() and ApplyLoopWrapper()?
> > > >
> > >
> > > I have given this comment to move the related code to separate
> > > functions to slightly simplify ApplyWorkerMain() code but if you
> > > don't like we can move it back. I am not sure I like the new
> > > function names in the patch though.
> >
> > Okay, I'm fine with moving this code but perhaps we can find a better
> > function name as "Wrapper" seems slightly odd to me.
> >
> 
> Agreed.
> 
> > For example,
> > start_table_sync_start() and start_apply_changes() or something (it
> > seems we use the snake case for static functions in worker.c).
> >
> 
> I am fine with something on these lines, how about start_table_sync() and
> start_apply() respectively?
Adopted. (If we come up with better names, we can change those then)

Kindly have a look at attached the v22.
It has incorporated other improvements of TAP test,
refinement of the DisableSubscriptionOnError function and so on.

Best Regards,
Takamichi Osumi



v22-0001-Optionally-disable-subscriptions-on-error.patch
Description: v22-0001-Optionally-disable-subscriptions-on-error.patch


Re: Optionally automatically disable logical replication subscriptions on error

2022-02-24 Thread Amit Kapila
On Thu, Feb 24, 2022 at 6:30 PM Masahiko Sawada  wrote:
>
> On Thu, Feb 24, 2022 at 8:08 PM Amit Kapila  wrote:
> >
> > On Thu, Feb 24, 2022 at 1:20 PM Masahiko Sawada  
> > wrote:
> > >
> > > Here are some comments:
> > >
> > > Why do we need SyncTableStartWrapper() and ApplyLoopWrapper()?
> > >
> >
> > I have given this comment to move the related code to separate
> > functions to slightly simplify ApplyWorkerMain() code but if you don't
> > like we can move it back. I am not sure I like the new function names
> > in the patch though.
>
> Okay, I'm fine with moving this code but perhaps we can find a better
> function name as "Wrapper" seems slightly odd to me.
>

Agreed.

> For example,
> start_table_sync_start() and start_apply_changes() or something (it
> seems we use the snake case for static functions in worker.c).
>

I am fine with something on these lines, how about start_table_sync()
and start_apply() respectively?

-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2022-02-24 Thread Masahiko Sawada
On Thu, Feb 24, 2022 at 8:08 PM Amit Kapila  wrote:
>
> On Thu, Feb 24, 2022 at 1:20 PM Masahiko Sawada  wrote:
> >
> > Here are some comments:
> >
> > Why do we need SyncTableStartWrapper() and ApplyLoopWrapper()?
> >
>
> I have given this comment to move the related code to separate
> functions to slightly simplify ApplyWorkerMain() code but if you don't
> like we can move it back. I am not sure I like the new function names
> in the patch though.

Okay, I'm fine with moving this code but perhaps we can find a better
function name as "Wrapper" seems slightly odd to me. For example,
start_table_sync_start() and start_apply_changes() or something (it
seems we use the snake case for static functions in worker.c).

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Optionally automatically disable logical replication subscriptions on error

2022-02-24 Thread Amit Kapila
On Thu, Feb 24, 2022 at 1:20 PM Masahiko Sawada  wrote:
>
> Here are some comments:
>
> Why do we need SyncTableStartWrapper() and ApplyLoopWrapper()?
>

I have given this comment to move the related code to separate
functions to slightly simplify ApplyWorkerMain() code but if you don't
like we can move it back. I am not sure I like the new function names
in the patch though.

> ---
> +   /*
> +* Log the error that caused DisableSubscriptionOnError to be called. 
> We
> +* do this immediately so that it won't be lost if some other internal
> +* error occurs in the following code.
> +*/
> +   EmitErrorReport();
> +   AbortOutOfAnyTransaction();
> +   FlushErrorState();
>
> Do we need to hold interrupts during cleanup here?
>

I think so. We do prevent interrupts via
HOLD_INTERRUPTS/RESUME_INTERRUPTS during cleanup.


-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2022-02-23 Thread Masahiko Sawada
On Tue, Feb 22, 2022 at 3:03 PM Peter Smith  wrote:
>
> ~~~
>
> 1. worker.c - comment
>
> + subform = (Form_pg_subscription) GETSTRUCT(tup);
> +
> + /*
> + * We would not be here unless this subscription's disableonerr field was
> + * true, but check whether that field has changed in the interim.
> + */
> + if (!subform->subdisableonerr)
> + {
> + heap_freetuple(tup);
> + table_close(rel, RowExclusiveLock);
> + CommitTransactionCommand();
> + return false;
> + }
>
> I felt that comment belongs above the subform assignment because that
> is the only reason we are getting the subform again.

IIUC if we return false here, the same error will be emitted twice.
And I'm not sure this check is really necessary. It would work only
when the subdisableonerr is set to false concurrently, but doesn't
work for the opposite caces. I think we can check
MySubscription->disableonerr and then just update the tuple.

Here are some comments:

Why do we need SyncTableStartWrapper() and ApplyLoopWrapper()?

---
+   /*
+* Log the error that caused DisableSubscriptionOnError to be called. We
+* do this immediately so that it won't be lost if some other internal
+* error occurs in the following code.
+*/
+   EmitErrorReport();
+   AbortOutOfAnyTransaction();
+   FlushErrorState();

Do we need to hold interrupts during cleanup here?

Regards,


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




RE: Optionally automatically disable logical replication subscriptions on error

2022-02-23 Thread osumi.takami...@fujitsu.com
On Wednesday, February 23, 2022 6:52 PM Tang, Haiying/唐 海英 
 wrote:
> I have a comment on v21 patch.
> 
> I wonder if we really need subscription s2 in 028_disable_on_error.pl. I 
> think for
> subscription s2, we only tested some normal cases(which could be tested with
> s1), and didn't test any error case, which means it wouldn't be automatically
> disabled.
> Is there any reason for creating subscription s2?
Hi, thank you for your review !

It's for checking there's no impact/influence when disabling one subscription
on the other subscription if any.

*But*, when I have a look at the past tests to add options (e.g. streaming,
two_phase), we don't have this kind of test that I have for disable_on_error 
patch.
Therefore, I'd like to fix the test as you suggested in my next version.


Best Regards,
Takamichi Osumi



RE: Optionally automatically disable logical replication subscriptions on error

2022-02-23 Thread tanghy.f...@fujitsu.com
Hi Osumi-san,

I have a comment on v21 patch.

I wonder if we really need subscription s2 in 028_disable_on_error.pl. I think
for subscription s2, we only tested some normal cases(which could be tested 
with s1), 
and didn't test any error case, which means it wouldn't be automatically 
disabled. 
Is there any reason for creating subscription s2?

Regards,
Tang


Re: Optionally automatically disable logical replication subscriptions on error

2022-02-21 Thread Peter Smith
On Tue, Feb 22, 2022 at 3:11 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Tuesday, February 22, 2022 7:53 AM Peter Smith  
> wrote:
> > On Mon, Feb 21, 2022 at 11:44 PM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > On Monday, February 21, 2022 2:56 PM Peter Smith
> >  wrote:
> > > > Thanks for addressing my previous comments. Now I have looked at v19.
> > > >
> > > > On Mon, Feb 21, 2022 at 11:25 AM osumi.takami...@fujitsu.com
> > > >  wrote:
> > > > >
> > > > > On Friday, February 18, 2022 3:27 PM Peter Smith
> > > >  wrote:
> > > > > > Hi. Below are my code review comments for v18.
> > > > > Thank you for your review !
> > > > ...
> > > > > > 5. src/backend/replication/logical/worker.c -
> > > > > > DisableSubscriptionOnError
> > > > > >
> > > > > > + /*
> > > > > > + * We would not be here unless this subscription's disableonerr
> > > > > > + field was
> > > > > > + * true when our worker began applying changes, but check
> > > > > > + whether that
> > > > > > + * field has changed in the interim.
> > > > > > + */
> > > > > >
> > > > > > Apparently, this function might just do nothing if it detects
> > > > > > some situation where the flag was changed somehow, but I'm not
> > > > > > 100% sure that the callers are properly catering for when nothing
> > happens.
> > > > > >
> > > > > > IMO it would be better if this function would return true/false
> > > > > > to mean "did disable subscription happen or not?" because that
> > > > > > will give the calling code the chance to check the function
> > > > > > return and do the right thing - e.g. if the caller first thought
> > > > > > it should be disabled but then
> > > > it turned out it did NOT disable...
> > > > > I don't think we need to do something more.
> > > > > After this function, table sync worker and the apply worker just exit.
> > > > > IMO, we don't need to do additional work for already-disabled
> > > > > subscription on the caller sides.
> > > > > It should be sufficient to fulfill the purpose of
> > > > > DisableSubscriptionOnError or confirm it has been fulfilled.
> > > >
> > > > Hmmm -  Yeah, it may be the workers might just exit soon after
> > > > anyhow as you say so everything comes out in the wash, but still, I
> > > > felt for this case when DisableSubscriptionOnError turned out to do
> > > > nothing it would be better to exit via the existing logic. And that is 
> > > > easy to do
> > if the function returns true/false.
> > > >
> > > > For example, changes like below seemed neater code to me. YMMV.
> > > >
> > > > BEFORE (SyncTableStartWrapper):
> > > > if (MySubscription->disableonerr)
> > > > {
> > > > DisableSubscriptionOnError();
> > > > proc_exit(0);
> > > > }
> > > > AFTER
> > > > if (MySubscription->disableonerr && DisableSubscriptionOnError())
> > > > proc_exit(0);
> > > >
> > > > BEFORE (ApplyLoopWrapper)
> > > > if (MySubscription->disableonerr)
> > > > {
> > > > /* Disable the subscription */
> > > > DisableSubscriptionOnError();
> > > > return;
> > > > }
> > > > AFTER
> > > > if (MySubscription->disableonerr && DisableSubscriptionOnError())
> > > > return;
> > > Okay, so this return value works for better readability.
> > > Fixed.
> > >
> > >
> > > > ~~~
> > > >
> > > > Here are a couple more comments:
> > > >
> > > > 1. src/backend/replication/logical/worker.c -
> > > > DisableSubscriptionOnError, Refactor error handling
> > > >
> > > > (this comment assumes the above gets changed too)
> > > I think those are independent.
> >
> > OK. I was only curious if the change #5 above might cause the error to be 
> > logged
> > 2x, if the DisableSubscriptionOnError returns false.
> > - firstly, when it logs errors within the function
> > - secondly, by normal error mechanism when the caller re-throws it.
> >
> > But, if you are sure that won't happen then it is good news.
> I didn't feel this would become a substantial issue.
>
> When we alter subscription with disable_on_error = false
> after we go into the DisableSubscriptionOnError,
> we don't disable the subscription in the same function.
> That means we launch new apply workers repeatedly after that
> until we solve the error cause or we set the disable_on_error = true again.
>
> So, if we confirm that the disable_on_error = false in the 
> DisableSubscriptionOnError,
> it's highly possible that we'll get more same original errors from new apply 
> workers.
>
> This leads to another question, we should suppress the 2nd error(if there is),
> even when it's highly possible that we'll get more same errors by new apply 
> workers
> created repeatedly or not. I wasn't sure if the implementation complexity for 
> this wins the log print.
>
> So, kindly let me keep the current code as is.
> If someone wants share his/her opinion on this, please let me know.

OK, but is it really correct that this scenario can happen "When we
alter subscription with disable_on_error = false after we go into the
DisableSubscriptionOnError". Actually, I thought this window may be
much bigger 

RE: Optionally automatically disable logical replication subscriptions on error

2022-02-21 Thread osumi.takami...@fujitsu.com
On Tuesday, February 22, 2022 7:53 AM Peter Smith  wrote:
> On Mon, Feb 21, 2022 at 11:44 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Monday, February 21, 2022 2:56 PM Peter Smith
>  wrote:
> > > Thanks for addressing my previous comments. Now I have looked at v19.
> > >
> > > On Mon, Feb 21, 2022 at 11:25 AM osumi.takami...@fujitsu.com
> > >  wrote:
> > > >
> > > > On Friday, February 18, 2022 3:27 PM Peter Smith
> > >  wrote:
> > > > > Hi. Below are my code review comments for v18.
> > > > Thank you for your review !
> > > ...
> > > > > 5. src/backend/replication/logical/worker.c -
> > > > > DisableSubscriptionOnError
> > > > >
> > > > > + /*
> > > > > + * We would not be here unless this subscription's disableonerr
> > > > > + field was
> > > > > + * true when our worker began applying changes, but check
> > > > > + whether that
> > > > > + * field has changed in the interim.
> > > > > + */
> > > > >
> > > > > Apparently, this function might just do nothing if it detects
> > > > > some situation where the flag was changed somehow, but I'm not
> > > > > 100% sure that the callers are properly catering for when nothing
> happens.
> > > > >
> > > > > IMO it would be better if this function would return true/false
> > > > > to mean "did disable subscription happen or not?" because that
> > > > > will give the calling code the chance to check the function
> > > > > return and do the right thing - e.g. if the caller first thought
> > > > > it should be disabled but then
> > > it turned out it did NOT disable...
> > > > I don't think we need to do something more.
> > > > After this function, table sync worker and the apply worker just exit.
> > > > IMO, we don't need to do additional work for already-disabled
> > > > subscription on the caller sides.
> > > > It should be sufficient to fulfill the purpose of
> > > > DisableSubscriptionOnError or confirm it has been fulfilled.
> > >
> > > Hmmm -  Yeah, it may be the workers might just exit soon after
> > > anyhow as you say so everything comes out in the wash, but still, I
> > > felt for this case when DisableSubscriptionOnError turned out to do
> > > nothing it would be better to exit via the existing logic. And that is 
> > > easy to do
> if the function returns true/false.
> > >
> > > For example, changes like below seemed neater code to me. YMMV.
> > >
> > > BEFORE (SyncTableStartWrapper):
> > > if (MySubscription->disableonerr)
> > > {
> > > DisableSubscriptionOnError();
> > > proc_exit(0);
> > > }
> > > AFTER
> > > if (MySubscription->disableonerr && DisableSubscriptionOnError())
> > > proc_exit(0);
> > >
> > > BEFORE (ApplyLoopWrapper)
> > > if (MySubscription->disableonerr)
> > > {
> > > /* Disable the subscription */
> > > DisableSubscriptionOnError();
> > > return;
> > > }
> > > AFTER
> > > if (MySubscription->disableonerr && DisableSubscriptionOnError())
> > > return;
> > Okay, so this return value works for better readability.
> > Fixed.
> >
> >
> > > ~~~
> > >
> > > Here are a couple more comments:
> > >
> > > 1. src/backend/replication/logical/worker.c -
> > > DisableSubscriptionOnError, Refactor error handling
> > >
> > > (this comment assumes the above gets changed too)
> > I think those are independent.
> 
> OK. I was only curious if the change #5 above might cause the error to be 
> logged
> 2x, if the DisableSubscriptionOnError returns false.
> - firstly, when it logs errors within the function
> - secondly, by normal error mechanism when the caller re-throws it.
> 
> But, if you are sure that won't happen then it is good news.
I didn't feel this would become a substantial issue.

When we alter subscription with disable_on_error = false
after we go into the DisableSubscriptionOnError,
we don't disable the subscription in the same function.
That means we launch new apply workers repeatedly after that
until we solve the error cause or we set the disable_on_error = true again.

So, if we confirm that the disable_on_error = false in the 
DisableSubscriptionOnError,
it's highly possible that we'll get more same original errors from new apply 
workers.

This leads to another question, we should suppress the 2nd error(if there is),
even when it's highly possible that we'll get more same errors by new apply 
workers
created repeatedly or not. I wasn't sure if the implementation complexity for 
this wins the log print.

So, kindly let me keep the current code as is.
If someone wants share his/her opinion on this, please let me know.
 
> >
> >
> > > +static void
> > > +DisableSubscriptionOnError(void)
> > > +{
> > > + Relation rel;
> > > + bool nulls[Natts_pg_subscription];  bool
> > > +replaces[Natts_pg_subscription];  Datum
> > > +values[Natts_pg_subscription];  HeapTuple tup;
> > > +Form_pg_subscription subform;
> > > +
> > > + /* Emit the error */
> > > + EmitErrorReport();
> > > + /* Abort any active transaction */ AbortOutOfAnyTransaction();
> > > + /* Reset the ErrorContext */
> > > + FlushErrorState();
> > > +
> > > + /* Disable the 

Re: Optionally automatically disable logical replication subscriptions on error

2022-02-21 Thread Peter Smith
On Mon, Feb 21, 2022 at 11:44 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Monday, February 21, 2022 2:56 PM Peter Smith  
> wrote:
> > Thanks for addressing my previous comments. Now I have looked at v19.
> >
> > On Mon, Feb 21, 2022 at 11:25 AM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > On Friday, February 18, 2022 3:27 PM Peter Smith
> >  wrote:
> > > > Hi. Below are my code review comments for v18.
> > > Thank you for your review !
> > ...
> > > > 5. src/backend/replication/logical/worker.c -
> > > > DisableSubscriptionOnError
> > > >
> > > > + /*
> > > > + * We would not be here unless this subscription's disableonerr
> > > > + field was
> > > > + * true when our worker began applying changes, but check whether
> > > > + that
> > > > + * field has changed in the interim.
> > > > + */
> > > >
> > > > Apparently, this function might just do nothing if it detects some
> > > > situation where the flag was changed somehow, but I'm not 100% sure
> > > > that the callers are properly catering for when nothing happens.
> > > >
> > > > IMO it would be better if this function would return true/false to
> > > > mean "did disable subscription happen or not?" because that will
> > > > give the calling code the chance to check the function return and do
> > > > the right thing - e.g. if the caller first thought it should be 
> > > > disabled but then
> > it turned out it did NOT disable...
> > > I don't think we need to do something more.
> > > After this function, table sync worker and the apply worker just exit.
> > > IMO, we don't need to do additional work for already-disabled
> > > subscription on the caller sides.
> > > It should be sufficient to fulfill the purpose of
> > > DisableSubscriptionOnError or confirm it has been fulfilled.
> >
> > Hmmm -  Yeah, it may be the workers might just exit soon after anyhow as you
> > say so everything comes out in the wash, but still, I felt for this case 
> > when
> > DisableSubscriptionOnError turned out to do nothing it would be better to 
> > exit
> > via the existing logic. And that is easy to do if the function returns 
> > true/false.
> >
> > For example, changes like below seemed neater code to me. YMMV.
> >
> > BEFORE (SyncTableStartWrapper):
> > if (MySubscription->disableonerr)
> > {
> > DisableSubscriptionOnError();
> > proc_exit(0);
> > }
> > AFTER
> > if (MySubscription->disableonerr && DisableSubscriptionOnError())
> > proc_exit(0);
> >
> > BEFORE (ApplyLoopWrapper)
> > if (MySubscription->disableonerr)
> > {
> > /* Disable the subscription */
> > DisableSubscriptionOnError();
> > return;
> > }
> > AFTER
> > if (MySubscription->disableonerr && DisableSubscriptionOnError()) return;
> Okay, so this return value works for better readability.
> Fixed.
>
>
> > ~~~
> >
> > Here are a couple more comments:
> >
> > 1. src/backend/replication/logical/worker.c - DisableSubscriptionOnError,
> > Refactor error handling
> >
> > (this comment assumes the above gets changed too)
> I think those are independent.

OK. I was only curious if the change #5 above might cause the error to
be logged 2x, if the DisableSubscriptionOnError returns false.
- firstly, when it logs errors within the function
- secondly, by normal error mechanism when the caller re-throws it.

But, if you are sure that won't happen then it is good news.

>
>
> > +static void
> > +DisableSubscriptionOnError(void)
> > +{
> > + Relation rel;
> > + bool nulls[Natts_pg_subscription];
> > + bool replaces[Natts_pg_subscription];
> > + Datum values[Natts_pg_subscription];
> > + HeapTuple tup;
> > + Form_pg_subscription subform;
> > +
> > + /* Emit the error */
> > + EmitErrorReport();
> > + /* Abort any active transaction */
> > + AbortOutOfAnyTransaction();
> > + /* Reset the ErrorContext */
> > + FlushErrorState();
> > +
> > + /* Disable the subscription in a fresh transaction */
> > + StartTransactionCommand();
> >
> > If this DisableSubscriptionOnError function decides later that actually the
> > 'disableonerr' flag is false (i.e. it's NOT going to disable the 
> > subscription after
> > all) then IMO it make more sense that the error logging for that case 
> > should just
> > do whatever it is doing now by the normal error processing mechanism.
> >
> > In other words, I thought perhaps the code to EmitErrorReport/FlushError 
> > state
> > etc be moved to be BELOW the if
> > (!subform->subdisableonerr) bail-out code?
> >
> > Please see what you think in my attached POC [1]. It seems neater to me, and
> > tests are all OK. Maybe I am mistaken...
> I had a concern that this order change of codes would have a negative
> impact when we have another new error during the call of 
> DisableSubscriptionOnError.
>
> With the debugger, I raised an error in this function before emitting the 
> original error.
> As a result, the original error that makes the apply worker go into the path 
> of
> DisableSubscriptionOnError (in my test, duplication error) has vanished.
> In this sense, v19 looks safer, and the 

RE: Optionally automatically disable logical replication subscriptions on error

2022-02-21 Thread osumi.takami...@fujitsu.com
On Monday, February 21, 2022 2:56 PM Peter Smith  wrote:
> Thanks for addressing my previous comments. Now I have looked at v19.
> 
> On Mon, Feb 21, 2022 at 11:25 AM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Friday, February 18, 2022 3:27 PM Peter Smith
>  wrote:
> > > Hi. Below are my code review comments for v18.
> > Thank you for your review !
> ...
> > > 5. src/backend/replication/logical/worker.c -
> > > DisableSubscriptionOnError
> > >
> > > + /*
> > > + * We would not be here unless this subscription's disableonerr
> > > + field was
> > > + * true when our worker began applying changes, but check whether
> > > + that
> > > + * field has changed in the interim.
> > > + */
> > >
> > > Apparently, this function might just do nothing if it detects some
> > > situation where the flag was changed somehow, but I'm not 100% sure
> > > that the callers are properly catering for when nothing happens.
> > >
> > > IMO it would be better if this function would return true/false to
> > > mean "did disable subscription happen or not?" because that will
> > > give the calling code the chance to check the function return and do
> > > the right thing - e.g. if the caller first thought it should be disabled 
> > > but then
> it turned out it did NOT disable...
> > I don't think we need to do something more.
> > After this function, table sync worker and the apply worker just exit.
> > IMO, we don't need to do additional work for already-disabled
> > subscription on the caller sides.
> > It should be sufficient to fulfill the purpose of
> > DisableSubscriptionOnError or confirm it has been fulfilled.
> 
> Hmmm -  Yeah, it may be the workers might just exit soon after anyhow as you
> say so everything comes out in the wash, but still, I felt for this case when
> DisableSubscriptionOnError turned out to do nothing it would be better to exit
> via the existing logic. And that is easy to do if the function returns 
> true/false.
> 
> For example, changes like below seemed neater code to me. YMMV.
> 
> BEFORE (SyncTableStartWrapper):
> if (MySubscription->disableonerr)
> {
> DisableSubscriptionOnError();
> proc_exit(0);
> }
> AFTER
> if (MySubscription->disableonerr && DisableSubscriptionOnError())
> proc_exit(0);
> 
> BEFORE (ApplyLoopWrapper)
> if (MySubscription->disableonerr)
> {
> /* Disable the subscription */
> DisableSubscriptionOnError();
> return;
> }
> AFTER
> if (MySubscription->disableonerr && DisableSubscriptionOnError()) return;
Okay, so this return value works for better readability.
Fixed.

 
> ~~~
> 
> Here are a couple more comments:
> 
> 1. src/backend/replication/logical/worker.c - DisableSubscriptionOnError,
> Refactor error handling
> 
> (this comment assumes the above gets changed too)
I think those are independent.


> +static void
> +DisableSubscriptionOnError(void)
> +{
> + Relation rel;
> + bool nulls[Natts_pg_subscription];
> + bool replaces[Natts_pg_subscription];
> + Datum values[Natts_pg_subscription];
> + HeapTuple tup;
> + Form_pg_subscription subform;
> +
> + /* Emit the error */
> + EmitErrorReport();
> + /* Abort any active transaction */
> + AbortOutOfAnyTransaction();
> + /* Reset the ErrorContext */
> + FlushErrorState();
> +
> + /* Disable the subscription in a fresh transaction */
> + StartTransactionCommand();
> 
> If this DisableSubscriptionOnError function decides later that actually the
> 'disableonerr' flag is false (i.e. it's NOT going to disable the subscription 
> after
> all) then IMO it make more sense that the error logging for that case should 
> just
> do whatever it is doing now by the normal error processing mechanism.
> 
> In other words, I thought perhaps the code to EmitErrorReport/FlushError state
> etc be moved to be BELOW the if
> (!subform->subdisableonerr) bail-out code?
> 
> Please see what you think in my attached POC [1]. It seems neater to me, and
> tests are all OK. Maybe I am mistaken...
I had a concern that this order change of codes would have a negative
impact when we have another new error during the call of 
DisableSubscriptionOnError.

With the debugger, I raised an error in this function before emitting the 
original error.
As a result, the original error that makes the apply worker go into the path of
DisableSubscriptionOnError (in my test, duplication error) has vanished.
In this sense, v19 looks safer, and the current order to handle error recovery 
first
looks better to me.

FYI, after the 2nd debugger error,
the next new apply worker created quickly met the same type of error,
went into the same path, and disabled the subscription with the log.
But, it won't be advisable to let the possibility left.

> ~~~
> 
> 2. Commit message - wording
> 
> Logical replication apply workers for a subscription can easily get stuck in 
> an
> infinite loop of attempting to apply a change, triggering an error (such as a
> constraint violation), exiting with an error written to the subscription 
> worker log,
> and restarting.
> 
> SUGGESTION
> 

Re: Optionally automatically disable logical replication subscriptions on error

2022-02-20 Thread Peter Smith
Thanks for addressing my previous comments. Now I have looked at v19.

On Mon, Feb 21, 2022 at 11:25 AM osumi.takami...@fujitsu.com
 wrote:
>
> On Friday, February 18, 2022 3:27 PM Peter Smith  
> wrote:
> > Hi. Below are my code review comments for v18.
> Thank you for your review !
...
> > 5. src/backend/replication/logical/worker.c - DisableSubscriptionOnError
> >
> > + /*
> > + * We would not be here unless this subscription's disableonerr field
> > + was
> > + * true when our worker began applying changes, but check whether that
> > + * field has changed in the interim.
> > + */
> >
> > Apparently, this function might just do nothing if it detects some situation
> > where the flag was changed somehow, but I'm not 100% sure that the callers
> > are properly catering for when nothing happens.
> >
> > IMO it would be better if this function would return true/false to mean "did
> > disable subscription happen or not?" because that will give the calling 
> > code the
> > chance to check the function return and do the right thing - e.g. if the 
> > caller first
> > thought it should be disabled but then it turned out it did NOT disable...
> I don't think we need to do something more.
> After this function, table sync worker and the apply worker
> just exit. IMO, we don't need to do additional work for
> already-disabled subscription on the caller sides.
> It should be sufficient to fulfill the purpose of
> DisableSubscriptionOnError or confirm it has been fulfilled.

Hmmm -  Yeah, it may be the workers might just exit soon after anyhow
as you say so everything comes out in the wash, but still, I felt for
this case when DisableSubscriptionOnError turned out to do nothing it
would be better to exit via the existing logic. And that is easy to do
if the function returns true/false.

For example, changes like below seemed neater code to me. YMMV.

BEFORE (SyncTableStartWrapper):
if (MySubscription->disableonerr)
{
DisableSubscriptionOnError();
proc_exit(0);
}
AFTER
if (MySubscription->disableonerr && DisableSubscriptionOnError())
proc_exit(0);

BEFORE (ApplyLoopWrapper)
if (MySubscription->disableonerr)
{
/* Disable the subscription */
DisableSubscriptionOnError();
return;
}
AFTER
if (MySubscription->disableonerr && DisableSubscriptionOnError())
return;

~~~

Here are a couple more comments:

1. src/backend/replication/logical/worker.c -
DisableSubscriptionOnError, Refactor error handling

(this comment assumes the above gets changed too)

+static void
+DisableSubscriptionOnError(void)
+{
+ Relation rel;
+ bool nulls[Natts_pg_subscription];
+ bool replaces[Natts_pg_subscription];
+ Datum values[Natts_pg_subscription];
+ HeapTuple tup;
+ Form_pg_subscription subform;
+
+ /* Emit the error */
+ EmitErrorReport();
+ /* Abort any active transaction */
+ AbortOutOfAnyTransaction();
+ /* Reset the ErrorContext */
+ FlushErrorState();
+
+ /* Disable the subscription in a fresh transaction */
+ StartTransactionCommand();

If this DisableSubscriptionOnError function decides later that
actually the 'disableonerr' flag is false (i.e. it's NOT going to
disable the subscription after all) then IMO it make more sense that
the error logging for that case should just do whatever it is doing
now by the normal error processing mechanism.

In other words, I thought perhaps the code to
EmitErrorReport/FlushError state etc be moved to be BELOW the if
(!subform->subdisableonerr) bail-out code?

Please see what you think in my attached POC [1]. It seems neater to
me, and tests are all OK. Maybe I am mistaken...

~~~

2. Commit message - wording

Logical replication apply workers for a subscription can easily get
stuck in an infinite loop of attempting to apply a change,
triggering an error (such as a constraint violation), exiting with
an error written to the subscription worker log, and restarting.

SUGGESTION
"exiting with an error written" --> "exiting with the error written"

--
[1] peter-v19-poc.diff - POC just to try some of my suggestions above
to make sure all tests still pass ok.

Kind Regards,
Peter Smith.
Fujitsu Australia.
diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index a3c240f..4fe6b7f 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2805,7 +2805,7 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 /*
  * Disable the current subscription, after error recovery processing.
  */
-static void
+static bool
 DisableSubscriptionOnError(void)
 {
Relationrel;
@@ -2815,14 +2815,8 @@ DisableSubscriptionOnError(void)
HeapTuple   tup;
Form_pg_subscription subform;
 
-   /* Emit the error */
-   EmitErrorReport();
-   /* Abort any active transaction */
-   AbortOutOfAnyTransaction();
-   /* Reset the ErrorContext */
-   FlushErrorState();
-
/* Disable the subscription in a fresh transaction */
+   AbortOutOfAnyTransaction();

RE: Optionally automatically disable logical replication subscriptions on error

2022-02-20 Thread osumi.takami...@fujitsu.com
On Friday, February 18, 2022 3:27 PM Peter Smith  wrote:
> Hi. Below are my code review comments for v18.
Thank you for your review !

> ==
> 
> 1. Commit Message - wording
> 
> BEFORE
> To partially remedy the situation, adding a new subscription_parameter named
> 'disable_on_error'.
> 
> AFTER
> To partially remedy the situation, this patch adds a new
> subscription_parameter named 'disable_on_error'.
Fixed.

> ~~~
> 
> 2. Commit message - wording
> 
> BEFORE
> Require to bump catalog version.
> 
> AFTER
> A catalog version bump is required.
Fixed.

> ~~~
> 
> 3. doc/src/sgml/ref/alter_subscription.sgml - whitespace
> 
> @@ -201,8 +201,8 @@ ALTER SUBSCRIPTION  class="parameter">name RENAME TO <
>information.  The parameters that can be altered
>are slot_name,
>synchronous_commit,
> -  binary, and
> -  streaming.
> +  binary,streaming, and
> +  disable_on_error.
>   
> 
> There is a missing space before streaming.
Fixed. 



> ~~~
> 
> 4. src/backend/replication/logical/worker.c - WorkerErrorRecovery
> 
> @@ -2802,6 +2803,89 @@ LogicalRepApplyLoop(XLogRecPtr
> last_received)  }
> 
>  /*
> + * Worker error recovery processing, in preparation for disabling the
> + * subscription.
> + */
> +static void
> +WorkerErrorRecovery(void)
> 
> I was wondering about the need for this to be a separate function? It is only
> called immediately before calling 'DisableSubscriptionOnError'
> so would it maybe be better just to put this code inside
> DisableSubscriptionOnError with the appropriate comments?
I preferred to have one specific for error handling,
because from caller sides, when we catch error, it's apparent
that error recovery is done. But, the function name "DisableSubscriptionOnError"
by itself should have the nuance that we do something on error.
So, we can think that it's okay to have error recovery processing
in this function.

So, I removed the function and fixed some related comments.


> ~~~
> 
> 5. src/backend/replication/logical/worker.c - DisableSubscriptionOnError
> 
> + /*
> + * We would not be here unless this subscription's disableonerr field
> + was
> + * true when our worker began applying changes, but check whether that
> + * field has changed in the interim.
> + */
> 
> Apparently, this function might just do nothing if it detects some situation
> where the flag was changed somehow, but I'm not 100% sure that the callers
> are properly catering for when nothing happens.
> 
> IMO it would be better if this function would return true/false to mean "did
> disable subscription happen or not?" because that will give the calling code 
> the
> chance to check the function return and do the right thing - e.g. if the 
> caller first
> thought it should be disabled but then it turned out it did NOT disable...
I don't think we need to do something more.
After this function, table sync worker and the apply worker
just exit. IMO, we don't need to do additional work for
already-disabled subscription on the caller sides.
It should be sufficient to fulfill the purpose of
DisableSubscriptionOnError or confirm it has been fulfilled.


> ~~~
> 
> 6. src/backend/replication/logical/worker.c - LogicalRepHandleTableSync
> name
> 
> +/*
> + * Execute the initial sync with error handling. Disable the
> +subscription,
> + * if it's required.
> + */
> +static void
> +LogicalRepHandleTableSync(XLogRecPtr *origin_startpos,
> +   char **myslotname, MemoryContext cctx)
> 
> I felt that it is a bit overkill to put a "LogicalRep" prefix here because it 
> is a static
> function.
> 
> IMO this function should be renamed as 'SyncTableStartWrapper' because that
> describes better what it is doing.
Makes sense. Fixed.


> ~~~
> 
> 7. src/backend/replication/logical/worker.c - LogicalRepHandleTableSync
> Assert
> 
> Even though we can know this to be true because of where it is called from, I
> think the readability of the function will be improved if you add an 
> assertion at
> the top:
> 
> Assert(am_tablesync_worker());
Fixed.

> And then, because the function is clearly for Tablesync worker only there is 
> no
> need to keep mentioning that in the subsequent comments...
> 
> e.g.1
> /* This is table synchronization worker, call initial sync. */
> AFTER:
> /* Call initial sync. */
Fixed.

> e.g.2
> /*
>  * Report the table sync error. There is no corresponding message type
>  * for table synchronization.
>  */
> AFTER
> /*
>  * Report the error. There is no corresponding message type for table
>  * synchronization.
>  */
Agreed. Fixed


> ~~~
> 
> 8. src/backend/replication/logical/worker.c - LogicalRepHandleTableSync
> unnecessarily complex
> 
> +static void
> +LogicalRepHandleTableSync(XLogRecPtr *origin_startpos,
> +   char **myslotname, MemoryContext cctx) {
> + char*syncslotname;
> + bool error_recovery_done = false;
> 
> IMO this logic is way more complex than it needed to be. IIUC that
> 'error_recovery_done' and various conditions can be removed, and the whole

RE: Optionally automatically disable logical replication subscriptions on error

2022-02-16 Thread osumi.takami...@fujitsu.com
On Tuesday, February 15, 2022 2:19 PM I wrote
> On Monday, February 14, 2022 8:58 PM Amit Kapila
> > 2. Can we move the code related to tablesync worker and its error
> > handing (the code insider if (am_tablesync_worker())) to a separate
> > function say
> > LogicalRepHandleTableSync() or something like that.
> >
> > 3. Similarly, we can move apply-loop related code ("Run the main
> > loop.") to a separate function say LogicalRepHandleApplyMessages().
> >
> > If we do (2) and (3), I think the code in ApplyWorkerMain will look
> > better. What do you think?
> I agree with (2) and (3), since those contribute to better readability.
> 
> Attached a new patch v17 that addresses those refactorings.
Hi, I noticed that one new tap test was added in the src/test/subscription/
and needed to increment the number of my test of this patch.

Also, I conducted minor fixes of comments and function name.
Kindly have a look at the attached v18.

Best Regards,
Takamichi Osumi



v18-0001-Optionally-disable-subscriptions-on-error.patch
Description: v18-0001-Optionally-disable-subscriptions-on-error.patch


RE: Optionally automatically disable logical replication subscriptions on error

2022-02-14 Thread osumi.takami...@fujitsu.com
On Monday, February 14, 2022 8:58 PM Amit Kapila  
wrote:
> On Thu, Jan 6, 2022 at 11:23 AM osumi.takami...@fujitsu.com
>  wrote:
> >
> > Kindly have a look at the attached v16.
> >
> 
> Few comments:
Hi, thank you for checking the patch !

> =
> 1.
> @@ -3594,13 +3698,29 @@ ApplyWorkerMain(Datum main_arg)
> apply_error_callback_arg.command,
> apply_error_callback_arg.remote_xid,
> errdata->message);
> - MemoryContextSwitchTo(ecxt);
> +
> + if (!MySubscription->disableonerr)
> + {
> + /*
> + * Some work in error recovery work is done. Switch to the old
> + * memory context and rethrow.
> + */
> + MemoryContextSwitchTo(ecxt);
> + PG_RE_THROW();
> + }
>   }
> + else if (!MySubscription->disableonerr) PG_RE_THROW();
> 
> - PG_RE_THROW();
> 
> Can't we combine these two different checks for
> 'MySubscription->disableonerr' if you do it as a separate if check after 
> sending
> the stats message?
No, we can't. The second check of MySubscription->disableonerr is for the case
apply_error_callback_arg.command equals 0. We disable the subscription
on any errors. In other words, we need to rethrow the error in the case,
if the flag disableonerr is not set to true.

So, moving it to after sending
the stats message can't be done. At the same time, if we move
the disableonerr flag check outside of the apply_error_callback_arg.command 
condition
branch, we need to write another call of pgstat_report_subworker_error, with the
same arguments that we have now. This wouldn't be preferrable as well.

> 
> 2. Can we move the code related to tablesync worker and its error handing (the
> code insider if (am_tablesync_worker())) to a separate function say
> LogicalRepHandleTableSync() or something like that.
> 
> 3. Similarly, we can move apply-loop related code ("Run the main
> loop.") to a separate function say LogicalRepHandleApplyMessages().
> 
> If we do (2) and (3), I think the code in ApplyWorkerMain will look better. 
> What
> do you think?
I agree with (2) and (3), since those contribute to better readability.

Attached a new patch v17 that addresses those refactorings.


Best Regards,
Takamichi Osumi



v17-0001-Optionally-disable-subscriptions-on-error.patch
Description: v17-0001-Optionally-disable-subscriptions-on-error.patch


Re: Optionally automatically disable logical replication subscriptions on error

2022-02-14 Thread Amit Kapila
On Thu, Jan 6, 2022 at 11:23 AM osumi.takami...@fujitsu.com
 wrote:
>
> Kindly have a look at the attached v16.
>

Few comments:
=
1.
@@ -3594,13 +3698,29 @@ ApplyWorkerMain(Datum main_arg)
apply_error_callback_arg.command,
apply_error_callback_arg.remote_xid,
errdata->message);
- MemoryContextSwitchTo(ecxt);
+
+ if (!MySubscription->disableonerr)
+ {
+ /*
+ * Some work in error recovery work is done. Switch to the old
+ * memory context and rethrow.
+ */
+ MemoryContextSwitchTo(ecxt);
+ PG_RE_THROW();
+ }
  }
+ else if (!MySubscription->disableonerr)
+ PG_RE_THROW();

- PG_RE_THROW();

Can't we combine these two different checks for
'MySubscription->disableonerr' if you do it as a separate if check
after sending the stats message?

2. Can we move the code related to tablesync worker and its error
handing (the code insider if (am_tablesync_worker())) to a separate
function say LogicalRepHandleTableSync() or something like that.

3. Similarly, we can move apply-loop related code ("Run the main
loop.") to a separate function say LogicalRepHandleApplyMessages().

If we do (2) and (3), I think the code in ApplyWorkerMain will look
better. What do you think?

-- 
With Regards,
Amit Kapila.




RE: Optionally automatically disable logical replication subscriptions on error

2022-01-05 Thread osumi.takami...@fujitsu.com
On Thursday, January 6, 2022 12:17 PM Tang, Haiying/唐 海英 
 wrote:
> Thanks for updating the patch. Here are some comments:
Thank you for your review !

> 1)
> + /*
> +  * We would not be here unless this subscription's disableonerr field
> was
> +  * true when our worker began applying changes, but check whether
> that
> +  * field has changed in the interim.
> +  */
> + if (!subform->subdisableonerr)
> + {
> + /*
> +  * Disabling the subscription has been done already. No need
> of
> +  * additional work.
> +  */
> + heap_freetuple(tup);
> + table_close(rel, RowExclusiveLock);
> + CommitTransactionCommand();
> + return;
> + }
> 
> I don't understand what does "Disabling the subscription has been done
> already"
> mean, I think we only run here when subdisableonerr is changed in the interim.
> Should we modify this comment? Or remove it because there are already some
> explanations before.
Removed. The description you pointed out was redundant.

> 2)
> + /* Set the subscription to disabled, and note the reason. */
> + values[Anum_pg_subscription_subenabled - 1] =
> BoolGetDatum(false);
> + replaces[Anum_pg_subscription_subenabled - 1] = true;
> 
> I didn't see the code corresponding to "note the reason". Should we modify the
> comment?
Fixed the comment by removing the part.
We come here when an error occurred and the reason is printed as log
so no need to note more reason.

> 3)
> + booldisableonerr;   /* Whether errors automatically
> disable */
> 
> This comment is hard to understand. Maybe it can be changed to:
> 
> Indicates if the subscription should be automatically disabled when
> subscription workers detect any errors.
Agreed. Fixed.

Kindly have a look at the attached v16.

Best Regards,
Takamichi Osumi



v16-0001-Optionally-disable-subscriptions-on-error.patch
Description: v16-0001-Optionally-disable-subscriptions-on-error.patch


RE: Optionally automatically disable logical replication subscriptions on error

2022-01-05 Thread tanghy.f...@fujitsu.com
On Wednesday, January 5, 2022 8:53 PM osumi.takami...@fujitsu.com 
 wrote:
> 
> On Tuesday, December 28, 2021 11:53 AM Wang, Wei/王 威
>  wrote:
> > On Thursday, December 16, 2021 8:51 PM osumi.takami...@fujitsu.com
> >  wrote:
> > > Attached the updated patch v14.
> >
> > A comment to the timing of printing a log:
> Thank you for your review !
> 
> > After the log[1] was printed, I altered subscription's option
> > (DISABLE_ON_ERROR) from true to false before invoking
> > DisableSubscriptionOnError to disable subscription. Subscription was not
> > disabled.
> > [1] "LOG:  logical replication subscription "sub1" will be disabled due to 
> > an
> > error"
> >
> > I found this log is printed in function WorkerErrorRecovery:
> > +   ereport(LOG,
> > +   errmsg("logical replication subscription \"%s\" will
> > be disabled due to an error",
> > +  MySubscription->name));
> > This log is printed here, but in DisableSubscriptionOnError, there is a 
> > check to
> > confirm subscription's disableonerr field. If disableonerr is found changed 
> > from
> > true to false in DisableSubscriptionOnError, subscription will not be 
> > disabled.
> >
> > In this case, "disable subscription" is printed, but subscription will not 
> > be
> > disabled actually.
> > I think it is a little confused to user, so what about moving this message 
> > after
> > the check which is mentioned above in DisableSubscriptionOnError?
> Makes sense. I moved the log print after
> the check of the necessity to disable the subscription.
> 
> Also, I've scrutinized and refined the new TAP test as well for refactoring.
> As a result, I fixed wait_for_subscriptions()
> so that some extra codes that can be simplified,
> such as escaped variable and one part of WHERE clause, are removed.
> Other change I did is to replace two calls of wait_for_subscriptions()
> with polling_query_until() for the subscriber, in order to
> make the tests better and more suitable for the test purposes.
> Again, for this refinement, I've conducted a tight loop test
> to check no timing issue and found no problem.
> 

Thanks for updating the patch. Here are some comments:

1)
+   /*
+* We would not be here unless this subscription's disableonerr field 
was
+* true when our worker began applying changes, but check whether that
+* field has changed in the interim.
+*/
+   if (!subform->subdisableonerr)
+   {
+   /*
+* Disabling the subscription has been done already. No need of
+* additional work.
+*/
+   heap_freetuple(tup);
+   table_close(rel, RowExclusiveLock);
+   CommitTransactionCommand();
+   return;
+   }

I don't understand what does "Disabling the subscription has been done already"
mean, I think we only run here when subdisableonerr is changed in the interim.
Should we modify this comment? Or remove it because there are already some
explanations before.

2)
+   /* Set the subscription to disabled, and note the reason. */
+   values[Anum_pg_subscription_subenabled - 1] = BoolGetDatum(false);
+   replaces[Anum_pg_subscription_subenabled - 1] = true;

I didn't see the code corresponding to "note the reason". Should we modify the
comment?

3)
+   booldisableonerr;   /* Whether errors automatically disable 
*/

This comment is hard to understand. Maybe it can be changed to:

Indicates if the subscription should be automatically disabled when subscription
workers detect any errors.

Regards,
Tang


RE: Optionally automatically disable logical replication subscriptions on error

2022-01-05 Thread osumi.takami...@fujitsu.com
On Tuesday, December 28, 2021 11:53 AM Wang, Wei/王 威  
wrote:
> On Thursday, December 16, 2021 8:51 PM osumi.takami...@fujitsu.com
>  wrote:
> > Attached the updated patch v14.
> 
> A comment to the timing of printing a log:
Thank you for your review !

> After the log[1] was printed, I altered subscription's option
> (DISABLE_ON_ERROR) from true to false before invoking
> DisableSubscriptionOnError to disable subscription. Subscription was not
> disabled.
> [1] "LOG:  logical replication subscription "sub1" will be disabled due to an
> error"
> 
> I found this log is printed in function WorkerErrorRecovery:
> + ereport(LOG,
> + errmsg("logical replication subscription \"%s\" will
> be disabled due to an error",
> +MySubscription->name));
> This log is printed here, but in DisableSubscriptionOnError, there is a check 
> to
> confirm subscription's disableonerr field. If disableonerr is found changed 
> from
> true to false in DisableSubscriptionOnError, subscription will not be 
> disabled.
> 
> In this case, "disable subscription" is printed, but subscription will not be
> disabled actually.
> I think it is a little confused to user, so what about moving this message 
> after
> the check which is mentioned above in DisableSubscriptionOnError?
Makes sense. I moved the log print after
the check of the necessity to disable the subscription.

Also, I've scrutinized and refined the new TAP test as well for refactoring.
As a result, I fixed wait_for_subscriptions()
so that some extra codes that can be simplified,
such as escaped variable and one part of WHERE clause, are removed.
Other change I did is to replace two calls of wait_for_subscriptions()
with polling_query_until() for the subscriber, in order to
make the tests better and more suitable for the test purposes.
Again, for this refinement, I've conducted a tight loop test
to check no timing issue and found no problem.

Best Regards,
Takamichi Osumi



v15-0001-Optionally-disable-subscriptions-on-error.patch
Description: v15-0001-Optionally-disable-subscriptions-on-error.patch


RE: Optionally automatically disable logical replication subscriptions on error

2021-12-27 Thread wangw.f...@fujitsu.com
On Thursday, December 16, 2021 8:51 PM osumi.takami...@fujitsu.com 
 wrote:
> Attached the updated patch v14.

A comment to the timing of printing a log:
After the log[1] was printed, I altered subscription's option
(DISABLE_ON_ERROR) from true to false before invoking DisableSubscriptionOnError
to disable subscription. Subscription was not disabled.
[1] "LOG:  logical replication subscription "sub1" will be disabled due to an 
error"

I found this log is printed in function WorkerErrorRecovery:
+   ereport(LOG,
+   errmsg("logical replication subscription \"%s\" will be 
disabled due to an error",
+  MySubscription->name));
This log is printed here, but in DisableSubscriptionOnError, there is a check to
confirm subscription's disableonerr field. If disableonerr is found changed from
true to false in DisableSubscriptionOnError, subscription will not be disabled.

In this case, "disable subscription" is printed, but subscription will not be
disabled actually.
I think it is a little confused to user, so what about moving this message after
the check which is mentioned above in DisableSubscriptionOnError?


Regards,
Wang wei


RE: Optionally automatically disable logical replication subscriptions on error

2021-12-22 Thread osumi.takami...@fujitsu.com
On Tuesday, December 21, 2021 11:18 PM I wrote:
> On Thursday, December 16, 2021 9:51 PM I wrote:
> > Attached the updated patch v14.
> FYI, I've conducted a test of disable_on_error flag using pg_upgrade.  I
> prepared PG14 and HEAD applied with disable_on_error patch.
> Then, I setup a logical replication pair of the publisher and the subscriber 
> by 14
> and executed pg_upgrade for both the publisher and the subscriber
> individually.
> 
> After the updation, on the subscriber, I've confirmed the disable_on_error is
> false via both pg_subscription and \dRs+, as expected.
Additionally, I've tested the new TAP test in a tight loop
that executed 027_disable_on_error.pl 100 times sequentially.
There was no failure, which means
any timing issue should not exist in the test.

Best Regards,
Takamichi Osumi



RE: Optionally automatically disable logical replication subscriptions on error

2021-12-21 Thread osumi.takami...@fujitsu.com
On Thursday, December 16, 2021 9:51 PM I wrote:
> Attached the updated patch v14.
FYI, I've conducted a test of disable_on_error flag using
pg_upgrade.  I prepared PG14 and HEAD applied with disable_on_error patch.
Then, I setup a logical replication pair of the publisher and the subscriber by 
14
and executed pg_upgrade for both the publisher and the subscriber individually.

After the updation, on the subscriber, I've confirmed the disable_on_error is 
false
via both pg_subscription and \dRs+, as expected.

Best Regards,
Takamichi Osumi




RE: Optionally automatically disable logical replication subscriptions on error

2021-12-16 Thread osumi.takami...@fujitsu.com
On Thursday, December 16, 2021 2:32 PM Greg Nancarrow  
wrote:
> On Tue, Dec 14, 2021 at 4:34 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > Besides all of those changes, I've removed the obsolete comment of
> > DisableSubscriptionOnError in v12.
> >
> 
> I have a few minor comments, otherwise the patch LGTM at this point:
Thank you for your review !

> doc/src/sgml/catalogs.sgml
> (1)
> Current comment says:
> 
> +   If true, the subscription will be disabled when subscription's
> +   worker detects any errors
> 
> However, in create_subscription.sgml, it says "disabled if any errors are
> detected by subscription workers ..."
> 
> For consistency, I think it should be:
> 
> +   If true, the subscription will be disabled when subscription
> +   workers detect any errors
Okay. Fixed.
 
> src/bin/psql/describe.c
> (2)
> I think that:
> 
> +  gettext_noop("Disable On Error"));
> 
> should be:
> 
> +  gettext_noop("Disable on error"));
> 
> for consistency with the uppercase/lowercase usage on other similar entries?
> (e.g. "Two phase commit")
Agreed. Fixed.

> src/include/catalog/pg_subscription.h
> (3)
> 
> +  bool subdisableonerr; /* True if apply errors should disable the
> +   * subscription upon error */
> 
> The comment should just say "True if occurrence of apply errors should disable
> the subscription"
Fixed.

Attached the updated patch v14.

Best Regards,
Takamichi Osumi



v14-0001-Optionally-disable-subscriptions-on-error.patch
Description: v14-0001-Optionally-disable-subscriptions-on-error.patch


Re: Optionally automatically disable logical replication subscriptions on error

2021-12-15 Thread Greg Nancarrow
On Tue, Dec 14, 2021 at 4:34 PM osumi.takami...@fujitsu.com
 wrote:
>
> Besides all of those changes, I've removed the obsolete
> comment of DisableSubscriptionOnError in v12.
>

I have a few minor comments, otherwise the patch LGTM at this point:

doc/src/sgml/catalogs.sgml
(1)
Current comment says:

+   If true, the subscription will be disabled when subscription's
+   worker detects any errors

However, in create_subscription.sgml, it says "disabled if any errors
are detected by subscription workers ..."

For consistency, I think it should be:

+   If true, the subscription will be disabled when subscription
+   workers detect any errors

src/bin/psql/describe.c
(2)
I think that:

+  gettext_noop("Disable On Error"));

should be:

+  gettext_noop("Disable on error"));

for consistency with the uppercase/lowercase usage on other similar entries?
(e.g. "Two phase commit")


src/include/catalog/pg_subscription.h
(3)

+  bool subdisableonerr; /* True if apply errors should disable the
+   * subscription upon error */

The comment should just say "True if occurrence of apply errors should
disable the subscription"


Regards,
Greg Nancarrow
Fujitsu Australia




RE: Optionally automatically disable logical replication subscriptions on error

2021-12-13 Thread osumi.takami...@fujitsu.com
On Monday, December 13, 2021 6:57 PM vignesh C  wrote:
> On Mon, Dec 6, 2021 at 4:22 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > I've attached the new version v12.
I appreciate your review.


> Thanks for the updated patch, few comments:
> 1) This is not required as it is not used in the caller.
> +++ b/src/backend/replication/logical/launcher.c
> @@ -132,6 +132,7 @@ get_subscription_list(void)
> sub->dbid = subform->subdbid;
> sub->owner = subform->subowner;
> sub->enabled = subform->subenabled;
> +   sub->disableonerr = subform->subdisableonerr;
> sub->name = pstrdup(NameStr(subform->subname));
> /* We don't fill fields we are not interested in. */
Okay.
The comment of the get_subscription_list() mentions that
we collect and fill only fields related to worker start/stop.
Then, I didn't need it. Fixed.


> 2) Should this be changed:
> +   subdisableonerr bool
> +  
> +  
> +   If true, the subscription will be disabled when subscription
> +   worker detects any errors
> +  
> + 
> To:
> +   subdisableonerr bool
> +  
> +  
> +   If true, the subscription will be disabled when subscription's
> +   worker detects any errors
> +  
> + 
I felt either is fine. So fixed.


> 3) The last line can be slightly adjusted to keep within 80 chars:
> +  Specifies whether the subscription should be automatically disabled
> +  if any errors are detected by subscription workers during data
> +  replication from the publisher. The default is
> false.
Fixed.

> 4) Similarly this too can be handled:
> --- a/src/backend/catalog/system_views.sql
> +++ b/src/backend/catalog/system_views.sql
> @@ -1259,7 +1259,7 @@ REVOKE ALL ON pg_replication_origin_status FROM
> public;
>  -- All columns of pg_subscription except subconninfo are publicly readable.
>  REVOKE ALL ON pg_subscription FROM public;  GRANT SELECT (oid,
> subdbid, subname, subowner, subenabled, subbinary,
> -  substream, subtwophasestate, subslotname,
> subsynccommit, subpublications)
> +  substream, subtwophasestate, subdisableonerr,
> subslotname, subsynccommit, subpublications)
>  ON pg_subscription TO public;
I split the line into two to make each line less than 80 chars.

> 5) Since disabling subscription code is common in and else, can we move it
> below:
> +   if (MySubscription->disableonerr)
> +   {
> +   WorkerErrorRecovery();
> +   error_recovery_done = true;
> +   }
> +   else
> +   {
> +   /*
> +* Some work in error recovery work is
> done. Switch to the old
> +* memory context and rethrow.
> +*/
> +   MemoryContextSwitchTo(ecxt);
> +   PG_RE_THROW();
> +   }
> +   }
> +   else
> +   {
> +   /*
> +* Don't miss any error, even when it's not
> reported to stats
> +* collector.
> +*/
> +   if (MySubscription->disableonerr)
> +   {
> +   WorkerErrorRecovery();
> +   error_recovery_done = true;
> +   }
> +   else
> +   /* Simply rethrow because of no recovery
> work */
> +   PG_RE_THROW();
> +   }
I moved the common code below those condition branches.


> 6) Can we move LockSharedObject below the if condition.
> +   subform = (Form_pg_subscription) GETSTRUCT(tup);
> +   LockSharedObject(SubscriptionRelationId, subform->oid, 0,
> AccessExclusiveLock);
> +
> +   /*
> +* We would not be here unless this subscription's
> disableonerr field was
> +* true when our worker began applying changes, but check whether
> that
> +* field has changed in the interim.
> +*/
> +   if (!subform->subdisableonerr)
> +   {
> +   /*
> +* Disabling the subscription has been done already. No need
> of
> +* additional work.
> +*/
> +   heap_freetuple(tup);
> +   table_close(rel, RowExclusiveLock);
> +   CommitTransactionCommand();
> +   return;
> +   }
> +
Fixed.

Besides all of those changes, I've removed the obsolete
comment of DisableSubscriptionOnError in v12.

Best Regards,
Takamichi Osumi



v13-0001-Optionally-disable-subscriptions-on-error.patch
Description: v13-0001-Optionally-disable-subscriptions-on-error.patch


Re: Optionally automatically disable logical replication subscriptions on error

2021-12-13 Thread vignesh C
On Mon, Dec 6, 2021 at 4:22 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Monday, December 6, 2021 1:16 PM Greg Nancarrow  
> wrote:
> > On Sat, Dec 4, 2021 at 12:20 AM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > Hi, I've made a new patch v11 that incorporated suggestions described
> > above.
> > >
> >
> > Some review comments for the v11 patch:
> Thank you for your reviews !
>
> > doc/src/sgml/ref/create_subscription.sgml
> > (1) Possible wording improvement?
> >
> > BEFORE:
> > +  Specifies whether the subscription should be automatically disabled
> > + if replicating data from the publisher triggers errors. The default
> > + is false.
> > AFTER:
> > +  Specifies whether the subscription should be automatically disabled
> > + if any errors are detected by subscription workers during data
> > + replication from the publisher. The default is false.
> Fixed.
>
> > src/backend/replication/logical/worker.c
> > (2) WorkerErrorRecovery comments
> > Instead of:
> >
> > + * As a preparation for disabling the subscription, emit the error,
> > + * handle the transaction and clean up the memory context of
> > + * error. ErrorContext is reset by FlushErrorState.
> >
> > why not just say:
> >
> > + Worker error recovery processing, in preparation for disabling the
> > + subscription.
> >
> > And then comment the function's code lines:
> >
> > e.g.
> >
> > /* Emit the error */
> > ...
> > /* Abort any active transaction */
> > ...
> > /* Reset the ErrorContext */
> > ...
> Agreed. Fixed.
>
> > (3) DisableSubscriptionOnError return
> >
> > The "if (!subform->subdisableonerr)" block should probably first:
> >heap_freetuple(tup);
> >
> > (regardless of the fact the only current caller will proc_exit anyway)
> Fixed.
>
> > (4) did_error flag
> >
> > I think perhaps the previously-used flag name "disable_subscription"
> > is better, or maybe "error_recovery_done".
> > Also, I think it would look better if it was set AFTER
> > WorkerErrorRecovery() was called.
> Adopted error_recovery_done
> and changed its places accordingly.
>
> > (5) DisableSubscriptionOnError LOG message
> >
> > This version of the patch removes the LOG message:
> >
> > + ereport(LOG,
> > + errmsg("logical replication subscription \"%s\" will be disabled due
> > to error: %s",
> > +MySubscription->name, edata->message));
> >
> > Perhaps a similar error message could be logged prior to EmitErrorReport()?
> >
> > e.g.
> >  "logical replication subscription \"%s\" will be disabled due to an error"
> Added.
>
> I've attached the new version v12.

Thanks for the updated patch, few comments:
1) This is not required as it is not used in the caller.
+++ b/src/backend/replication/logical/launcher.c
@@ -132,6 +132,7 @@ get_subscription_list(void)
sub->dbid = subform->subdbid;
sub->owner = subform->subowner;
sub->enabled = subform->subenabled;
+   sub->disableonerr = subform->subdisableonerr;
sub->name = pstrdup(NameStr(subform->subname));
/* We don't fill fields we are not interested in. */

2) Should this be changed:
+   subdisableonerr bool
+  
+  
+   If true, the subscription will be disabled when subscription
+   worker detects any errors
+  
+ 
To:
+   subdisableonerr bool
+  
+  
+   If true, the subscription will be disabled when subscription's
+   worker detects any errors
+  
+ 

3) The last line can be slightly adjusted to keep within 80 chars:
+  Specifies whether the subscription should be automatically disabled
+  if any errors are detected by subscription workers during data
+  replication from the publisher. The default is
false.

4) Similarly this too can be handled:
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1259,7 +1259,7 @@ REVOKE ALL ON pg_replication_origin_status FROM public;
 -- All columns of pg_subscription except subconninfo are publicly readable.
 REVOKE ALL ON pg_subscription FROM public;
 GRANT SELECT (oid, subdbid, subname, subowner, subenabled, subbinary,
-  substream, subtwophasestate, subslotname,
subsynccommit, subpublications)
+  substream, subtwophasestate, subdisableonerr,
subslotname, subsynccommit, subpublications)
 ON pg_subscription TO public;

5) Since disabling subscription code is common in and else, can we
move it below:
+   if (MySubscription->disableonerr)
+   {
+   WorkerErrorRecovery();
+   error_recovery_done = true;
+   }
+   else
+   {
+   /*
+* Some work in error recovery work is
done. Switch to the old
+* memory context and rethrow.
+*/
+   

Re: Optionally automatically disable logical replication subscriptions on error

2021-12-09 Thread Mark Dilger



> On Dec 8, 2021, at 8:09 PM, Amit Kapila  wrote:
> 
> So, do you agree that we can disable the subscription on any error if
> this parameter is set?

Yes, I think that is fine.  We can commit it that way, and revisit the issue 
for v16 if it becomes a problem in practice.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Optionally automatically disable logical replication subscriptions on error

2021-12-08 Thread Amit Kapila
On Wed, Dec 8, 2021 at 9:22 PM Mark Dilger  wrote:
>
>
> > On Dec 8, 2021, at 5:10 AM, Amit Kapila  wrote:
> >
> > I think if we are really worried about transient errors then probably
> > the idea "disable only if the same error has occurred more than X
> > times" seems preferable as compared to taking a decision on which
> > error_codes fall in the transient error category.
>
> No need.  We can revisit this design decision in a later release cycle if the 
> current patch's design proves problematic in the field.
>

So, do you agree that we can disable the subscription on any error if
this parameter is set?

-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2021-12-08 Thread Mark Dilger



> On Dec 8, 2021, at 5:10 AM, Amit Kapila  wrote:
> 
> I think if we are really worried about transient errors then probably
> the idea "disable only if the same error has occurred more than X
> times" seems preferable as compared to taking a decision on which
> error_codes fall in the transient error category.

No need.  We can revisit this design decision in a later release cycle if the 
current patch's design proves problematic in the field.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Optionally automatically disable logical replication subscriptions on error

2021-12-08 Thread Amit Kapila
On Tue, Dec 7, 2021 at 5:52 AM Greg Nancarrow  wrote:
>
> On Tue, Dec 7, 2021 at 3:06 AM Mark Dilger  
> wrote:
> >
> > My concern about disabling a subscription in response to *any* error is 
> > that people may find the feature does more harm than good.  Disabling the 
> > subscription in response to an occasional deadlock against other database 
> > users, or occasional resource pressure, might annoy people and lead to the 
> > feature simply not being used.
> >
> I can understand this point of view.
> It kind of suggests to me the possibility of something like a
> configurable timeout (e.g. disable the subscription if the same error
> has occurred for more than X minutes) or, similarly, perhaps if some
> threshold has been reached (e.g. same error has occurred more than X
> times), but I think that this was previously suggested by Peter Smith
> and the idea wasn't looked upon all that favorably?
>

I think if we are really worried about transient errors then probably
the idea "disable only if the same error has occurred more than X
times" seems preferable as compared to taking a decision on which
error_codes fall in the transient error category.

-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2021-12-06 Thread Greg Nancarrow
On Tue, Dec 7, 2021 at 3:06 AM Mark Dilger  wrote:
>
> My concern about disabling a subscription in response to *any* error is that 
> people may find the feature does more harm than good.  Disabling the 
> subscription in response to an occasional deadlock against other database 
> users, or occasional resource pressure, might annoy people and lead to the 
> feature simply not being used.
>
I can understand this point of view.
It kind of suggests to me the possibility of something like a
configurable timeout (e.g. disable the subscription if the same error
has occurred for more than X minutes) or, similarly, perhaps if some
threshold has been reached (e.g. same error has occurred more than X
times), but I think that this was previously suggested by Peter Smith
and the idea wasn't looked upon all that favorably?

Regards,
Greg Nancarrow
Fujitsu Australia




Re: Optionally automatically disable logical replication subscriptions on error

2021-12-06 Thread Mark Dilger



> On Dec 5, 2021, at 10:56 PM, osumi.takami...@fujitsu.com wrote:
> 
> In my humble opinion, I felt the original purpose of the patch was to 
> partially remedy
> the situation that during the failure of apply, the apply process keeps going
> into the infinite error loop.

I agree.

> I'd say that in this sense, if we include such resource errors, we fail to 
> achieve
> the purpose in some cases, because of some left possibilities of infinite 
> loop.
> Disabling the subscription with even one any error excludes this irregular 
> possibility,
> since there's no room to continue the infinite loop.

I don't think there is any right answer here.  It's a question of policy 
preferences.

My concern about disabling a subscription in response to *any* error is that 
people may find the feature does more harm than good.  Disabling the 
subscription in response to an occasional deadlock against other database 
users, or occasional resource pressure, might annoy people and lead to the 
feature simply not being used.

I am happy to defer to your policy preference.  Thanks for your work on the 
patch!

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Optionally automatically disable logical replication subscriptions on error

2021-12-06 Thread Amit Kapila
On Mon, Dec 6, 2021 at 10:07 AM Mark Dilger
 wrote:
>
> > On Dec 1, 2021, at 8:48 PM, Amit Kapila  wrote:
> >
> > The patch disables the subscription for non-transient errors. I am not
> > sure if we can easily make the call to decide whether any particular
> > error is transient or not. For example, DISK_FULL or OUT_OF_MEMORY
> > might not rectify itself. Why not just allow to disable the
> > subscription on any error? And then let the user check the error
> > either in view or logs and decide whether it would like to enable the
> > subscription or do something before it (like making space in disk, or
> > fixing the network).
>
> The original idea of the patch, back when I first wrote and proposed it, was 
> to remove the *absurdity* of retrying a transaction which, in the absence of 
> human intervention, was guaranteed to simply fail again ad infinitum.  
> Retrying in the face of resource errors is not *absurd* even though it might 
> fail again ad infinitum.  The reason is that there is at least a chance that 
> the situation will clear up without human intervention.
>
> > The other problem I see with this transient error stuff is maintaining
> > the list of error codes that we think are transient. I think we need a
> > discussion for each of the error_codes we are listing now and whatever
> > new error_code we add in the future which doesn't seem like a good
> > idea.
>
> A reasonable rule might be:  "the subscription will be disabled if the server 
> can determine that retries cannot possibly succeed without human 
> intervention."  We shouldn't need to categorize all error codes perfectly, as 
> long as we're conservative.  What I propose is similar to how we determine 
> whether to mark a function leakproof; we don't have to mark all leakproof 
> functions as such, we just can't mark one as such if it is not.
>
> If we're going to debate the error codes, I think we would start with an 
> empty list, and add to the list on sufficient analysis.
>

Yeah, an empty list is a sort of what I thought was a good start
point. I feel we should learn from real-world use cases to see if
people really want to continue retrying even after using this option.


-- 
With Regards,
Amit Kapila.




RE: Optionally automatically disable logical replication subscriptions on error

2021-12-06 Thread osumi.takami...@fujitsu.com
On Monday, December 6, 2021 1:16 PM Greg Nancarrow  wrote:
> On Sat, Dec 4, 2021 at 12:20 AM osumi.takami...@fujitsu.com
>  wrote:
> >
> > Hi, I've made a new patch v11 that incorporated suggestions described
> above.
> >
> 
> Some review comments for the v11 patch:
Thank you for your reviews !
 
> doc/src/sgml/ref/create_subscription.sgml
> (1) Possible wording improvement?
> 
> BEFORE:
> +  Specifies whether the subscription should be automatically disabled
> + if replicating data from the publisher triggers errors. The default
> + is false.
> AFTER:
> +  Specifies whether the subscription should be automatically disabled
> + if any errors are detected by subscription workers during data
> + replication from the publisher. The default is false.
Fixed.

> src/backend/replication/logical/worker.c
> (2) WorkerErrorRecovery comments
> Instead of:
> 
> + * As a preparation for disabling the subscription, emit the error,
> + * handle the transaction and clean up the memory context of
> + * error. ErrorContext is reset by FlushErrorState.
> 
> why not just say:
> 
> + Worker error recovery processing, in preparation for disabling the
> + subscription.
> 
> And then comment the function's code lines:
> 
> e.g.
> 
> /* Emit the error */
> ...
> /* Abort any active transaction */
> ...
> /* Reset the ErrorContext */
> ...
Agreed. Fixed.
 
> (3) DisableSubscriptionOnError return
> 
> The "if (!subform->subdisableonerr)" block should probably first:
>heap_freetuple(tup);
> 
> (regardless of the fact the only current caller will proc_exit anyway)
Fixed.
 
> (4) did_error flag
> 
> I think perhaps the previously-used flag name "disable_subscription"
> is better, or maybe "error_recovery_done".
> Also, I think it would look better if it was set AFTER
> WorkerErrorRecovery() was called.
Adopted error_recovery_done
and changed its places accordingly.
 
> (5) DisableSubscriptionOnError LOG message
> 
> This version of the patch removes the LOG message:
> 
> + ereport(LOG,
> + errmsg("logical replication subscription \"%s\" will be disabled due
> to error: %s",
> +MySubscription->name, edata->message));
> 
> Perhaps a similar error message could be logged prior to EmitErrorReport()?
> 
> e.g.
>  "logical replication subscription \"%s\" will be disabled due to an error"
Added.

I've attached the new version v12.

Best Regards,
Takamichi Osumi



v12-0001-Optionally-disable-subscriptions-on-error.patch
Description: v12-0001-Optionally-disable-subscriptions-on-error.patch


RE: Optionally automatically disable logical replication subscriptions on error

2021-12-05 Thread osumi.takami...@fujitsu.com
On Monday, December 6, 2021 1:38 PM Mark Dilger  
wrote:
> > On Dec 1, 2021, at 8:48 PM, Amit Kapila  wrote:
> >
> > The patch disables the subscription for non-transient errors. I am not
> > sure if we can easily make the call to decide whether any particular
> > error is transient or not. For example, DISK_FULL or OUT_OF_MEMORY
> > might not rectify itself. Why not just allow to disable the
> > subscription on any error? And then let the user check the error
> > either in view or logs and decide whether it would like to enable the
> > subscription or do something before it (like making space in disk, or
> > fixing the network).
> 
> The original idea of the patch, back when I first wrote and proposed it, was 
> to
> remove the *absurdity* of retrying a transaction which, in the absence of
> human intervention, was guaranteed to simply fail again ad infinitum.
> Retrying in the face of resource errors is not *absurd* even though it might 
> fail
> again ad infinitum.  The reason is that there is at least a chance that the
> situation will clear up without human intervention.
In my humble opinion, I felt the original purpose of the patch was to partially 
remedy
the situation that during the failure of apply, the apply process keeps going
into the infinite error loop.

I'd say that in this sense, if we include such resource errors, we fail to 
achieve
the purpose in some cases, because of some left possibilities of infinite loop.
Disabling the subscription with even one any error excludes this irregular 
possibility,
since there's no room to continue the infinite loop.

Best Regards,
Takamichi Osumi



Re: Optionally automatically disable logical replication subscriptions on error

2021-12-05 Thread Mark Dilger



> On Dec 1, 2021, at 8:48 PM, Amit Kapila  wrote:
> 
> The patch disables the subscription for non-transient errors. I am not
> sure if we can easily make the call to decide whether any particular
> error is transient or not. For example, DISK_FULL or OUT_OF_MEMORY
> might not rectify itself. Why not just allow to disable the
> subscription on any error? And then let the user check the error
> either in view or logs and decide whether it would like to enable the
> subscription or do something before it (like making space in disk, or
> fixing the network).

The original idea of the patch, back when I first wrote and proposed it, was to 
remove the *absurdity* of retrying a transaction which, in the absence of human 
intervention, was guaranteed to simply fail again ad infinitum.  Retrying in 
the face of resource errors is not *absurd* even though it might fail again ad 
infinitum.  The reason is that there is at least a chance that the situation 
will clear up without human intervention.

> The other problem I see with this transient error stuff is maintaining
> the list of error codes that we think are transient. I think we need a
> discussion for each of the error_codes we are listing now and whatever
> new error_code we add in the future which doesn't seem like a good
> idea.

A reasonable rule might be:  "the subscription will be disabled if the server 
can determine that retries cannot possibly succeed without human intervention." 
 We shouldn't need to categorize all error codes perfectly, as long as we're 
conservative.  What I propose is similar to how we determine whether to mark a 
function leakproof; we don't have to mark all leakproof functions as such, we 
just can't mark one as such if it is not.

If we're going to debate the error codes, I think we would start with an empty 
list, and add to the list on sufficient analysis.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Optionally automatically disable logical replication subscriptions on error

2021-12-05 Thread Greg Nancarrow
On Sat, Dec 4, 2021 at 12:20 AM osumi.takami...@fujitsu.com
 wrote:
>
> Hi, I've made a new patch v11 that incorporated suggestions described above.
>

Some review comments for the v11 patch:

doc/src/sgml/ref/create_subscription.sgml
(1) Possible wording improvement?

BEFORE:
+  Specifies whether the subscription should be automatically disabled
+  if replicating data from the publisher triggers errors. The default
+  is false.
AFTER:
+  Specifies whether the subscription should be automatically disabled
+  if any errors are detected by subscription workers during data
+  replication from the publisher. The default is false.

src/backend/replication/logical/worker.c
(2) WorkerErrorRecovery comments
Instead of:

+ * As a preparation for disabling the subscription, emit the error,
+ * handle the transaction and clean up the memory context of
+ * error. ErrorContext is reset by FlushErrorState.

why not just say:

+ Worker error recovery processing, in preparation for disabling the
+ subscription.

And then comment the function's code lines:

e.g.

/* Emit the error */
...
/* Abort any active transaction */
...
/* Reset the ErrorContext */
...

(3) DisableSubscriptionOnError return

The "if (!subform->subdisableonerr)" block should probably first:
   heap_freetuple(tup);

(regardless of the fact the only current caller will proc_exit anyway)

(4) did_error flag

I think perhaps the previously-used flag name "disable_subscription"
is better, or maybe "error_recovery_done".
Also, I think it would look better if it was set AFTER
WorkerErrorRecovery() was called.

(5) DisableSubscriptionOnError LOG message

This version of the patch removes the LOG message:

+ ereport(LOG,
+ errmsg("logical replication subscription \"%s\" will be disabled due
to error: %s",
+MySubscription->name, edata->message));

Perhaps a similar error message could be logged prior to EmitErrorReport()?

e.g.
 "logical replication subscription \"%s\" will be disabled due to an error"


Regards,
Greg Nancarrow
Fujitsu Australia




RE: Optionally automatically disable logical replication subscriptions on error

2021-12-03 Thread osumi.takami...@fujitsu.com
Thursday, December 2, 2021 4:41 PM I wrote:
> On Thursday, December 2, 2021 1:49 PM Amit Kapila
>  wrote:
> > On Thu, Dec 2, 2021 at 6:35 AM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > On Wednesday, December 1, 2021 10:16 PM Amit Kapila
> >  wrote:
> > > Updated the patch to include the notification.
> > >
> > The patch disables the subscription for non-transient errors. I am not
> > sure if we can easily make the call to decide whether any particular
> > error is transient or not. For example, DISK_FULL or OUT_OF_MEMORY
> might not rectify itself.
> > Why not just allow to disable the subscription on any error? And then
> > let the user check the error either in view or logs and decide whether
> > it would like to enable the subscription or do something before it
> > (like making space in disk, or fixing the network).
> Agreed. I'll treat any errors as the trigger of the feature in the next 
> version.
> 
> > The other problem I see with this transient error stuff is maintaining
> > the list of error codes that we think are transient. I think we need a
> > discussion for each of the error_codes we are listing now and whatever
> > new error_code we add in the future which doesn't seem like a good idea.
> This is also true. The maintenance cost of my current implementation didn't
> sound cheap.
> 
> > I think the code to deal with apply worker errors and then disable the
> > subscription has some flaws. Say, while disabling the subscription if
> > it leads to another error then I think the original error won't be
> > reported.  Can't we simply emit the error via EmitErrorReport and then
> > do AbortOutOfAnyTransaction, FlushErrorState, and any other memory
> > context clean up if required and then disable the subscription after coming
> out of catch?
> You are right. I'll fix related parts accordingly.
Hi, I've made a new patch v11 that incorporated suggestions described above.

There are several notes to share regarding v11 modifications.

1. Modified the commit message a bit.

2. DisableSubscriptionOnError() doesn't return ErrData anymore,
since now to emit error message is done in the error recovery area
and the function purpose has become purely to run a transaction to disable
the subscription.

3. In DisableSubscriptionOnError(), v10 rethrew the error if the 
disable_on_error
flag became false in the interim, but v11 just closes the transaction and
finishes the function.

4. If table sync worker detects an error during synchronization
and needs to disable the subscription, the worker disables it and just exit by 
proc_exit.
The processing after disabling the subscription didn't look necessary to me
for disabled subscription.

5. Only when we succeed in the table synchronization, it's necessary to
allocate slot name in long-lived context, after the table synchronization in
ApplyWorkerMain(). Otherwise, we'll see junk value of syncslotname
because it is the return value of LogicalRepSyncTableStart().

6. There are 3 places for error recovery in ApplyWorkerMain().
All of those might look similar but I didn't make an united function for them.
Those are slightly different from each other and I felt
readability is reduced by grouping them into one type of function call.

7. In v11, I covered the case that apply worker failed with
apply_error_callback_arg.command == 0, as one path to disable the subscription
in order to cover all errors.

8. I changed one flag name from 'disable_subscription' to 'did_error'
in ApplyWorkerMain().

9. All chages in this version are C codes and checked by pgindent.

Best Regards,
Takamichi Osumi



v11-0001-Optionally-disable-subscriptions-on-error.patch
Description: v11-0001-Optionally-disable-subscriptions-on-error.patch


RE: Optionally automatically disable logical replication subscriptions on error

2021-12-01 Thread osumi.takami...@fujitsu.com
On Thursday, December 2, 2021 1:49 PM Amit Kapila  
wrote:
> On Thu, Dec 2, 2021 at 6:35 AM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Wednesday, December 1, 2021 10:16 PM Amit Kapila
>  wrote:
> > Updated the patch to include the notification.
> >
> The patch disables the subscription for non-transient errors. I am not sure 
> if we
> can easily make the call to decide whether any particular error is transient 
> or
> not. For example, DISK_FULL or OUT_OF_MEMORY might not rectify itself.
> Why not just allow to disable the subscription on any error? And then let the
> user check the error either in view or logs and decide whether it would like 
> to
> enable the subscription or do something before it (like making space in disk, 
> or
> fixing the network).
Agreed. I'll treat any errors as the trigger of the feature
in the next version.

> The other problem I see with this transient error stuff is maintaining the 
> list of
> error codes that we think are transient. I think we need a discussion for 
> each of
> the error_codes we are listing now and whatever new error_code we add in the
> future which doesn't seem like a good idea.
This is also true. The maintenance cost of my current implementation
didn't sound cheap.

> I think the code to deal with apply worker errors and then disable the
> subscription has some flaws. Say, while disabling the subscription if it 
> leads to
> another error then I think the original error won't be reported.  Can't we 
> simply
> emit the error via EmitErrorReport and then do AbortOutOfAnyTransaction,
> FlushErrorState, and any other memory context clean up if required and then
> disable the subscription after coming out of catch?
You are right. I'll fix related parts accordingly.

Best Regards,
Takamichi Osumi



Re: Optionally automatically disable logical replication subscriptions on error

2021-12-01 Thread Amit Kapila
On Thu, Dec 2, 2021 at 6:35 AM osumi.takami...@fujitsu.com
 wrote:
>
> On Wednesday, December 1, 2021 10:16 PM Amit Kapila  
> wrote:
> Updated the patch to include the notification.
>

The patch disables the subscription for non-transient errors. I am not
sure if we can easily make the call to decide whether any particular
error is transient or not. For example, DISK_FULL or OUT_OF_MEMORY
might not rectify itself. Why not just allow to disable the
subscription on any error? And then let the user check the error
either in view or logs and decide whether it would like to enable the
subscription or do something before it (like making space in disk, or
fixing the network).

The other problem I see with this transient error stuff is maintaining
the list of error codes that we think are transient. I think we need a
discussion for each of the error_codes we are listing now and whatever
new error_code we add in the future which doesn't seem like a good
idea.

I think the code to deal with apply worker errors and then disable the
subscription has some flaws. Say, while disabling the subscription if
it leads to another error then I think the original error won't be
reported.  Can't we simply emit the error via EmitErrorReport and then
do AbortOutOfAnyTransaction, FlushErrorState, and any other memory
context clean up if required and then disable the subscription after
coming out of catch?

-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2021-12-01 Thread Greg Nancarrow
On Thu, Dec 2, 2021 at 12:05 PM osumi.takami...@fujitsu.com
 wrote:
>
> Updated the patch to include the notification.
>

For the catalogs.sgml update, I was thinking that the following
wording might sound a bit better:

+   If true, the subscription will be disabled when a subscription
+   worker detects non-transient errors (e.g. duplication error)
+   that require user intervention to resolve.

What do you think?

Regards,
Greg Nancarrow
Fujitsu Australia




RE: Optionally automatically disable logical replication subscriptions on error

2021-12-01 Thread osumi.takami...@fujitsu.com
On Wednesday, December 1, 2021 10:16 PM Amit Kapila  
wrote:
> On Wed, Dec 1, 2021 at 5:55 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Wednesday, December 1, 2021 3:02 PM vignesh C
>  wrote:
> > > On Tue, Nov 30, 2021 at 5:34 PM osumi.takami...@fujitsu.com
> > >  wrote:
> >
> > > 3) Can add a line in the commit message saying "Bump catalog version."
> > > as the patch involves changing the catalog.
> > Hmm, let me postpone this fix till the final version.
> > The catalog version gets easily updated by other patch commits and
> > including it in the middle of development can become cause of
> > conflicts of my patch when applied to the PG, which is possible to
> > make other reviewers stop reviewing.
> >
> 
> Vignesh seems to be suggesting just changing the commit message, not the
> actual code. This is sort of a reminder to the committer to change the 
> catversion
> before pushing the patch. So that shouldn't cause any conflicts while applying
> your patch.
Ah, sorry for my misunderstanding.
Updated the patch to include the notification.


Best Regards,
Takamichi Osumi



v10-0001-Optionally-disable-subscriptions-on-error.patch
Description: v10-0001-Optionally-disable-subscriptions-on-error.patch


Re: Optionally automatically disable logical replication subscriptions on error

2021-12-01 Thread Amit Kapila
On Wed, Dec 1, 2021 at 5:55 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Wednesday, December 1, 2021 3:02 PM vignesh C  wrote:
> > On Tue, Nov 30, 2021 at 5:34 PM osumi.takami...@fujitsu.com
> >  wrote:
>
> > 3) Can add a line in the commit message saying "Bump catalog version."
> > as the patch involves changing the catalog.
> Hmm, let me postpone this fix till the final version.
> The catalog version gets easily updated by other patch commits
> and including it in the middle of development can become
> cause of conflicts of my patch when applied to the PG,
> which is possible to make other reviewers stop reviewing.
>

Vignesh seems to be suggesting just changing the commit message, not
the actual code. This is sort of a reminder to the committer to change
the catversion before pushing the patch. So that shouldn't cause any
conflicts while applying your patch.

-- 
With Regards,
Amit Kapila.




RE: Optionally automatically disable logical replication subscriptions on error

2021-12-01 Thread osumi.takami...@fujitsu.com
On Wednesday, December 1, 2021 3:02 PM vignesh C  wrote:
> On Tue, Nov 30, 2021 at 5:34 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Tuesday, November 30, 2021 1:10 PM Greg Nancarrow
>  wrote:
> > > On Sat, Nov 27, 2021 at 1:36 AM osumi.takami...@fujitsu.com
> > >  wrote:
> > > >
> > > > This v7 uses v26 of skip xid patch [1]
> > > This patch no longer applies on the latest source.
> > > Also, the patch is missing an update to doc/src/sgml/catalogs.sgml,
> > > for the new "subdisableonerr" column of pg_subscription.
> > Thanks for your review !
> >
> > Fixed the documentation accordingly. Further, this comment invoked
> > some more refactoring of codes since I wrote some internal codes
> > related to 'disable_on_error' in an inconsistent order.
> > I fixed this by keeping patch's codes
> > after that of 'two_phase' subscription option as much as possible.
> >
> > I also conducted both pgindent and pgperltidy.
> >
> > Now, I'll share the v8 that uses PG
> > whose commit id is after 8d74fc9 (pg_stat_subscription_workers).
> 
> Thanks for the updated patch, few small comments:
I appreciate your check.

> 1) This should be changed:
> +   subdisableonerr bool
> +  
> +  
> +   If true, the subscription will be disabled when subscription
> +   worker detects an error
> +  
> + 
> 
> to:
> +   subdisableonerr bool
> +  
> +  
> +   If true, the subscription will be disabled when subscription
> +   worker detects non-transient errors
> +  
> + 
Fixed. Actually, there's no clear definition what "non-transient" means
in the documentation. So, I added some words to your suggestion,
which would give clearer understanding to users.

> 2) "Disable On Err" can be changed to "Disable On Error"
> + ",
> subtwophasestate AS \"%s\"\n"
> + ",
> subdisableonerr AS \"%s\"\n",
> +
> gettext_noop("Two phase commit"),
> +
> gettext_noop("Disable On Err"));
Fixed.

> 3) Can add a line in the commit message saying "Bump catalog version."
> as the patch involves changing the catalog.
Hmm, let me postpone this fix till the final version.
The catalog version gets easily updated by other patch commits
and including it in the middle of development can become
cause of conflicts of my patch when applied to the PG,
which is possible to make other reviewers stop reviewing.

> 4) This prototype is not required, since the function is called after the 
> function
> definition:
>  static inline void set_apply_error_context_xact(TransactionId xid,
> TimestampTz ts);  static inline void reset_apply_error_context_info(void);
> +static bool IsTransientError(ErrorData *edata);
Fixed.

> 5) we could use the new style here:
> +   ereport(LOG,
> +   (errmsg("logical replication subscription
> \"%s\" will be disabled due to error: %s",
> +   MySubscription->name,
> + edata->message)));
> 
> change it to:
> +   ereport(LOG,
> +   errmsg("logical replication subscription
> \"%s\" will be disabled due to error: %s",
> +   MySubscription->name,
> + edata->message));
> 
> Similarly it can be changed in the other ereports added.
Removed the unnecessary parentheses.

Best Regards,
Takamichi Osumi



v9-0001-Optionally-disable-subscriptions-on-error.patch
Description: v9-0001-Optionally-disable-subscriptions-on-error.patch


Re: Optionally automatically disable logical replication subscriptions on error

2021-11-30 Thread vignesh C
On Tue, Nov 30, 2021 at 5:34 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Tuesday, November 30, 2021 1:10 PM Greg Nancarrow  
> wrote:
> > On Sat, Nov 27, 2021 at 1:36 AM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > This v7 uses v26 of skip xid patch [1]
> > This patch no longer applies on the latest source.
> > Also, the patch is missing an update to doc/src/sgml/catalogs.sgml, for the
> > new "subdisableonerr" column of pg_subscription.
> Thanks for your review !
>
> Fixed the documentation accordingly. Further,
> this comment invoked some more refactoring of codes
> since I wrote some internal codes related to
> 'disable_on_error' in an inconsistent order.
> I fixed this by keeping patch's codes
> after that of 'two_phase' subscription option as much as possible.
>
> I also conducted both pgindent and pgperltidy.
>
> Now, I'll share the v8 that uses PG
> whose commit id is after 8d74fc9 (pg_stat_subscription_workers).

Thanks for the updated patch, few small comments:
1) This should be changed:
+   subdisableonerr bool
+  
+  
+   If true, the subscription will be disabled when subscription
+   worker detects an error
+  
+ 

to:
+   subdisableonerr bool
+  
+  
+   If true, the subscription will be disabled when subscription
+   worker detects non-transient errors
+  
+ 


2) "Disable On Err" can be changed to "Disable On Error"
+ ",
subtwophasestate AS \"%s\"\n"
+ ",
subdisableonerr AS \"%s\"\n",
+
gettext_noop("Two phase commit"),
+
gettext_noop("Disable On Err"));

3) Can add a line in the commit message saying "Bump catalog version."
as the patch involves changing the catalog.

4) This prototype is not required, since the function is called after
the function definition:
 static inline void set_apply_error_context_xact(TransactionId xid,
TimestampTz ts);
 static inline void reset_apply_error_context_info(void);
+static bool IsTransientError(ErrorData *edata);

5) we could use the new style here:
+   ereport(LOG,
+   (errmsg("logical replication subscription
\"%s\" will be disabled due to error: %s",
+   MySubscription->name, edata->message)));

change it to:
+   ereport(LOG,
+   errmsg("logical replication subscription
\"%s\" will be disabled due to error: %s",
+   MySubscription->name, edata->message));

Similarly it can be changed in the other ereports added.

Regards,
Vignesh




RE: Optionally automatically disable logical replication subscriptions on error

2021-11-30 Thread osumi.takami...@fujitsu.com
On Monday, November 29, 2021 2:38 PM vignesh C 
> Thanks for the updated patch, Few comments:
Thank you for your review !

> 1) Since this function is used only from 027_disable_on_error and not used by
> others, this can be moved to 027_disable_on_error:
> +sub wait_for_subscriptions
> +{
> +   my ($self, $dbname, @subscriptions) = @_;
> +
> +   # Unique-ify the subscriptions passed by the caller
> +   my %unique = map { $_ => 1 } @subscriptions;
> +   my @unique = sort keys %unique;
> +   my $unique_count = scalar(@unique);
> +
> +   # Construct a SQL list from the unique subscription names
> +   my @escaped = map { s/'/''/g; s/\\//g; $_ } @unique;
> +   my $sublist = join(', ', map { "'$_'" } @escaped);
> +
> +   my $polling_sql = qq(
> +   SELECT COUNT(1) = $unique_count FROM
> +   (SELECT s.oid
> +   FROM pg_catalog.pg_subscription s
> +   LEFT JOIN pg_catalog.pg_subscription_rel
> sr
> +   ON sr.srsubid = s.oid
> +   WHERE (sr IS NULL OR sr.srsubstate IN
> ('s', 'r'))
> + AND s.subname IN ($sublist)
> + AND s.subenabled IS TRUE
> +UNION
> +SELECT s.oid
> +   FROM pg_catalog.pg_subscription s
> +   WHERE s.subname IN ($sublist)
> + AND s.subenabled IS FALSE
> +   ) AS synced_or_disabled
> +   );
> +   return $self->poll_query_until($dbname, $polling_sql); }
Fixed.

> 2) The empty line after comment is not required, it can be removed
> +# Create non-unique data in both schemas on the publisher.
> +#
> +for $schema (@schemas)
> +{
Fixed.

> 3) Similarly it can be changed across the file
> +# Wait for the initial subscription synchronizations to finish or fail.
> +#
> +$node_subscriber->wait_for_subscriptions('postgres', @schemas)
> +   or die "Timed out while waiting for subscriber to synchronize
> +data";
> 
> +# Enter unique data for both schemas on the publisher.  This should
> +succeed on # the publisher node, and not cause any additional problems
> +on the subscriber # side either, though disabled subscription "s1" should not
> replicate anything.
> +#
> +for $schema (@schemas)
Fixed.
 
> 4) Since subid is used only at one place, no need of subid variable, you could
> replace subid with subform->oid in LockSharedObject
> +   Datum   values[Natts_pg_subscription];
> +   HeapTuple   tup;
> +   Oid subid;
> +   Form_pg_subscription subform;
> 
> +   subid = subform->oid;
> +   LockSharedObject(SubscriptionRelationId, subid, 0,
> + AccessExclusiveLock);
Fixed.

> 5) "permissions errors" should be "permission errors"
> +  Specifies whether the subscription should be automatically
> disabled
> +  if replicating data from the publisher triggers non-transient 
> errors
> +  such as referential integrity or permissions errors. The default is
> +  false.
> + 
Fixed.

The new patch v8 is shared in [1].

[1] - 
https://www.postgresql.org/message-id/TYCPR01MB83735AA021E0F614A3AB3221ED679%40TYCPR01MB8373.jpnprd01.prod.outlook.com

Best Regards,
Takamichi Osumi



RE: Optionally automatically disable logical replication subscriptions on error

2021-11-30 Thread osumi.takami...@fujitsu.com
On Tuesday, November 30, 2021 1:10 PM Greg Nancarrow  
wrote:
> On Sat, Nov 27, 2021 at 1:36 AM osumi.takami...@fujitsu.com
>  wrote:
> >
> > This v7 uses v26 of skip xid patch [1]
> This patch no longer applies on the latest source.
> Also, the patch is missing an update to doc/src/sgml/catalogs.sgml, for the
> new "subdisableonerr" column of pg_subscription.
Thanks for your review !

Fixed the documentation accordingly. Further,
this comment invoked some more refactoring of codes
since I wrote some internal codes related to
'disable_on_error' in an inconsistent order.
I fixed this by keeping patch's codes
after that of 'two_phase' subscription option as much as possible.

I also conducted both pgindent and pgperltidy.

Now, I'll share the v8 that uses PG
whose commit id is after 8d74fc9 (pg_stat_subscription_workers).

Best Regards,
Takamichi Osumi



v8-0001-Optionally-disable-subscriptions-on-error.patch
Description: v8-0001-Optionally-disable-subscriptions-on-error.patch


Re: Optionally automatically disable logical replication subscriptions on error

2021-11-29 Thread Greg Nancarrow
On Sat, Nov 27, 2021 at 1:36 AM osumi.takami...@fujitsu.com
 wrote:
>
> This v7 uses v26 of skip xid patch [1]
>

This patch no longer applies on the latest source.
Also, the patch is missing an update to doc/src/sgml/catalogs.sgml,
for the new "subdisableonerr" column of pg_subscription.


Regards,
Greg Nancarrow
Fujitsu Australia




Re: Optionally automatically disable logical replication subscriptions on error

2021-11-28 Thread vignesh C
On Fri, Nov 26, 2021 at 8:06 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Monday, November 22, 2021 3:53 PM vignesh C  wrote:
> > Few comments:
> Thank you so much for your review !
>
> > 1) Changes to handle pg_dump are missing. It should be done in
> > dumpSubscription and getSubscriptions
> Fixed.
>
> > 2) "And" is missing
> > --- a/doc/src/sgml/ref/alter_subscription.sgml
> > +++ b/doc/src/sgml/ref/alter_subscription.sgml
> > @@ -201,8 +201,8 @@ ALTER SUBSCRIPTION  > class="parameter">name RENAME TO <
> >information.  The parameters that can be altered
> >are slot_name,
> >synchronous_commit,
> > -  binary, and
> > -  streaming.
> > +  binary,streaming
> > +  disable_on_error.
> > Should be:
> > -  binary, and
> > -  streaming.
> > +  binary,streaming, and
> > +  disable_on_error.
> Fixed.
>
> > 3) Should we change this :
> > +  Specifies whether the subscription should be automatically
> > disabled
> > +  if replicating data from the publisher triggers non-transient 
> > errors
> > +  such as referential integrity or permissions errors. The default 
> > is
> > +  false.
> > to:
> > +  Specifies whether the subscription should be automatically
> > disabled
> > +  while replicating data from the publisher triggers
> > non-transient errors
> > +  such as referential integrity, permissions errors, etc. The
> > default is
> > +  false.
> I preferred the previous description. The option
> "disable_on_error" works with even one error.
> If we use "while", the nuance would be like
> we keep disabling a subscription more than once.
> This situation happens only when user makes
> the subscription enable without resolving the non-transient error,
> which sounds a bit unnatural. So, I wanna keep the previous description.
> If you are not satisfied with this, kindly let me know.
>
> This v7 uses v26 of skip xid patch [1]

Thanks for the updated patch, Few comments:
1) Since this function is used only from 027_disable_on_error and not
used by others, this can be moved to 027_disable_on_error:
+sub wait_for_subscriptions
+{
+   my ($self, $dbname, @subscriptions) = @_;
+
+   # Unique-ify the subscriptions passed by the caller
+   my %unique = map { $_ => 1 } @subscriptions;
+   my @unique = sort keys %unique;
+   my $unique_count = scalar(@unique);
+
+   # Construct a SQL list from the unique subscription names
+   my @escaped = map { s/'/''/g; s/\\//g; $_ } @unique;
+   my $sublist = join(', ', map { "'$_'" } @escaped);
+
+   my $polling_sql = qq(
+   SELECT COUNT(1) = $unique_count FROM
+   (SELECT s.oid
+   FROM pg_catalog.pg_subscription s
+   LEFT JOIN pg_catalog.pg_subscription_rel sr
+   ON sr.srsubid = s.oid
+   WHERE (sr IS NULL OR sr.srsubstate IN
('s', 'r'))
+ AND s.subname IN ($sublist)
+ AND s.subenabled IS TRUE
+UNION
+SELECT s.oid
+   FROM pg_catalog.pg_subscription s
+   WHERE s.subname IN ($sublist)
+ AND s.subenabled IS FALSE
+   ) AS synced_or_disabled
+   );
+   return $self->poll_query_until($dbname, $polling_sql);
+}

2) The empty line after comment is not required, it can be removed
+# Create non-unique data in both schemas on the publisher.
+#
+for $schema (@schemas)
+{

3) Similarly it can be changed across the file
+# Wait for the initial subscription synchronizations to finish or fail.
+#
+$node_subscriber->wait_for_subscriptions('postgres', @schemas)
+   or die "Timed out while waiting for subscriber to synchronize data";

+# Enter unique data for both schemas on the publisher.  This should succeed on
+# the publisher node, and not cause any additional problems on the subscriber
+# side either, though disabled subscription "s1" should not replicate anything.
+#
+for $schema (@schemas)

4) Since subid is used only at one place, no need of subid variable,
you could replace subid with subform->oid in LockSharedObject
+   Datum   values[Natts_pg_subscription];
+   HeapTuple   tup;
+   Oid subid;
+   Form_pg_subscription subform;

+   subid = subform->oid;
+   LockSharedObject(SubscriptionRelationId, subid, 0, AccessExclusiveLock);

5) "permissions errors" should be "permission errors"
+  Specifies whether the subscription should be automatically disabled
+  if replicating data from the publisher triggers non-transient errors
+  such as referential integrity or permissions errors. The default is
+  false.
+ 

Regards,
Vignesh




RE: Optionally automatically disable logical replication subscriptions on error

2021-11-26 Thread osumi.takami...@fujitsu.com
On Monday, November 22, 2021 3:53 PM vignesh C  wrote:
> Few comments:
Thank you so much for your review !

> 1) Changes to handle pg_dump are missing. It should be done in
> dumpSubscription and getSubscriptions
Fixed.

> 2) "And" is missing
> --- a/doc/src/sgml/ref/alter_subscription.sgml
> +++ b/doc/src/sgml/ref/alter_subscription.sgml
> @@ -201,8 +201,8 @@ ALTER SUBSCRIPTION  class="parameter">name RENAME TO <
>information.  The parameters that can be altered
>are slot_name,
>synchronous_commit,
> -  binary, and
> -  streaming.
> +  binary,streaming
> +  disable_on_error.
> Should be:
> -  binary, and
> -  streaming.
> +  binary,streaming, and
> +  disable_on_error.
Fixed.

> 3) Should we change this :
> +  Specifies whether the subscription should be automatically
> disabled
> +  if replicating data from the publisher triggers non-transient 
> errors
> +  such as referential integrity or permissions errors. The default is
> +  false.
> to:
> +  Specifies whether the subscription should be automatically
> disabled
> +  while replicating data from the publisher triggers
> non-transient errors
> +  such as referential integrity, permissions errors, etc. The
> default is
> +  false.
I preferred the previous description. The option
"disable_on_error" works with even one error.
If we use "while", the nuance would be like
we keep disabling a subscription more than once.
This situation happens only when user makes
the subscription enable without resolving the non-transient error,
which sounds a bit unnatural. So, I wanna keep the previous description.
If you are not satisfied with this, kindly let me know.

This v7 uses v26 of skip xid patch [1]

[1] - 
https://www.postgresql.org/message-id/CAD21AoDNe_O%2BCPucd_jQPu3gGGaCLNP%2BJ_kSPNecTdAM8HFPww%40mail.gmail.com

Best Regards,
Takamichi Osumi



v7-0001-Optionally-disable-subscriptions-on-error.patch
Description: v7-0001-Optionally-disable-subscriptions-on-error.patch


Re: Optionally automatically disable logical replication subscriptions on error

2021-11-21 Thread vignesh C
On Thu, Nov 18, 2021 at 12:52 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Thursday, November 18, 2021 2:08 PM Greg Nancarrow  
> wrote:
> > A minor comment:
> Thanks for your comments !
>
> > doc/src/sgml/ref/alter_subscription.sgml
> > (1) disable_on_err?
> >
> > +  disable_on_err.
> >
> > This doc update names the new parameter as "disable_on_err" instead of
> > "disable_on_error".
> > Also "disable_on_err" appears in a couple of the test case comments.
> Fixed all 3 places.
>
> At the same time, I changed one function name
> from IsSubscriptionDisablingError() to IsTransientError()
> so that it can express what it checks correctly.
> Of course, the return value of true or false
> becomes reverse by this name change, but
> This would make the function more general.
> Also, its comments were fixed.
>
> This version also depends on the v23 of skip xid [1]

Few comments:
1) Changes to handle pg_dump are missing. It should be done in
dumpSubscription and getSubscriptions

2) "And" is missing
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -201,8 +201,8 @@ ALTER SUBSCRIPTION name RENAME TO <
   information.  The parameters that can be altered
   are slot_name,
   synchronous_commit,
-  binary, and
-  streaming.
+  binary,streaming
+  disable_on_error.
Should be:
-  binary, and
-  streaming.
+  binary,streaming, and
+  disable_on_error.

3) Should we change this :
+  Specifies whether the subscription should be automatically disabled
+  if replicating data from the publisher triggers non-transient errors
+  such as referential integrity or permissions errors. The default is
+  false.
to:
+  Specifies whether the subscription should be automatically disabled
+  while replicating data from the publisher triggers
non-transient errors
+  such as referential integrity, permissions errors, etc. The
default is
+  false.

Regards,
Vignesh




RE: Optionally automatically disable logical replication subscriptions on error

2021-11-17 Thread osumi.takami...@fujitsu.com
On Thursday, November 18, 2021 2:08 PM Greg Nancarrow  
wrote:
> A minor comment:
Thanks for your comments !
 
> doc/src/sgml/ref/alter_subscription.sgml
> (1) disable_on_err?
> 
> +  disable_on_err.
> 
> This doc update names the new parameter as "disable_on_err" instead of
> "disable_on_error".
> Also "disable_on_err" appears in a couple of the test case comments.
Fixed all 3 places.

At the same time, I changed one function name
from IsSubscriptionDisablingError() to IsTransientError()
so that it can express what it checks correctly.
Of course, the return value of true or false
becomes reverse by this name change, but
This would make the function more general.
Also, its comments were fixed.

This version also depends on the v23 of skip xid [1]


[1] - 
https://www.postgresql.org/message-id/CAD21AoA5jupM6O%3DpYsyfaxQ1aMX-en8%3DQNgpW6KfXsg7_CS0CQ%40mail.gmail.com


Best Regards,
Takamichi Osumi



v6-0001-Optionally-disable-subscriptions-on-error.patch
Description: v6-0001-Optionally-disable-subscriptions-on-error.patch


Re: Optionally automatically disable logical replication subscriptions on error

2021-11-17 Thread Greg Nancarrow
On Tue, Nov 16, 2021 at 6:53 PM osumi.takami...@fujitsu.com
 wrote:
>
> This v5 depends on v23 skip xid in [1].
>

A minor comment:

doc/src/sgml/ref/alter_subscription.sgml
(1) disable_on_err?

+  disable_on_err.

This doc update names the new parameter as "disable_on_err" instead of
"disable_on_error".
Also "disable_on_err" appears in a couple of the test case comments.

Regards,
Greg Nancarrow
Fujitsu Australia




RE: Optionally automatically disable logical replication subscriptions on error

2021-11-16 Thread osumi.takami...@fujitsu.com
Thank you for checking the patch !

On Friday, November 12, 2021 1:49 PM Greg Nancarrow  wrote:
> On Thu, Nov 11, 2021 at 8:20 PM osumi.takami...@fujitsu.com
>  wrote:
> Some comments on the v4 patch:
> 
> (1) Patch subject
> I think the patch subject should say "disable" instead of "disabling":
>Optionally disable subscriptions on error
Fixed.

 
> doc/src/sgml/ref/create_subscription.sgml
> (2) spelling mistake
> +  if replicating data from the publisher triggers
> + non-transicent errors
> 
> non-transicent -> non-transient
Fixed.

 
> (I notice Vignesh also pointed this out)
> 
> src/backend/replication/logical/worker.c
> (3) calling geterrcode()
> The new IsSubscriptionDisablingError() function calls geterrcode().
> According to the function comment for geterrcode(), it is only intended for 
> use
> in error callbacks.
> Instead of calling geterrcode(), couldn't the ErrorData from PG_CATCH block be
> passed to IsSubscriptionDisablingError() instead (from which it can get the
> sqlerrcode)?
> 
> (4) DisableSubscriptionOnError
> DisableSubscriptionOnError() is again calling MemoryContextSwitch() and
> CopyErrorData().
> I think the ErrorData from the PG_CATCH block could simply be passed to
> DisableSubscriptionOnError() instead of the memory-context, and the existing
> MemoryContextSwitch() and CopyErrorData() calls could be removed from it.
> 
> AFAICS, applying (3) and (4) above would make the code a lot cleaner.
Fixed.

The updated patch is shared in [1].

[1] - 
https://www.postgresql.org/message-id/TYCPR01MB8373771371B31E1E6CC74B0AED999%40TYCPR01MB8373.jpnprd01.prod.outlook.com


Best Regards,
Takamichi Osumi



RE: Optionally automatically disable logical replication subscriptions on error

2021-11-15 Thread osumi.takami...@fujitsu.com
On Friday, November 12, 2021 1:09 PM vignesh C  wrote:
> On Thu, Nov 11, 2021 at 2:50 PM osumi.takami...@fujitsu.com
>  wrote:
> Thanks for the updated patch, Few comments:
> 1)  tab completion should be added for disable_on_error:
> /* Complete "CREATE SUBSCRIPTION  ...  WITH ( " */ else if
> (HeadMatches("CREATE", "SUBSCRIPTION") && TailMatches("WITH", "("))
> COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
>   "enabled", "slot_name", "streaming",
>   "synchronous_commit", "two_phase");
Fixed.

> 2) disable_on_error is supported by alter subscription, the same should be
> documented:
> @ -871,11 +886,19 @@ AlterSubscription(ParseState *pstate,
> AlterSubscriptionStmt *stmt,
> {
> supported_opts = (SUBOPT_SLOT_NAME |
> 
> SUBOPT_SYNCHRONOUS_COMMIT | SUBOPT_BINARY |
> -
> SUBOPT_STREAMING);
> +
> SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR);
> 
> parse_subscription_options(pstate,
> stmt->options,
> 
> supported_opts, );
> 
> +   if (IsSet(opts.specified_opts,
> SUBOPT_DISABLE_ON_ERR))
> +   {
> +
> values[Anum_pg_subscription_subdisableonerr - 1]
> +   =
> BoolGetDatum(opts.disableonerr);
> +
> replaces[Anum_pg_subscription_subdisableonerr - 1]
> +   = true;
> +   }
> +
Fixed the documentation. Also, add one test for alter subscription.

 
> 3) Describe subscriptions (dRs+) should include displaying of disableonerr:
> \dRs+ sub1
>   List of subscriptions
> Name |  Owner  | Enabled | Publication | Binary | Streaming | Two
> phase commit | Synchronous commit | Conninfo
> --+-+-+-++---+--
> ++---
>  sub1 | vignesh | t   | {pub1}  | f  | f | d
>  | off| dbname=postgres port=5432
> (1 row)
Fixed.


> 4) I felt transicent should be transient, might be a typo:
> +  Specifies whether the subscription should be automatically
> disabled
> +  if replicating data from the publisher triggers non-transicent 
> errors
> +  such as referential integrity or permissions errors. The default is
> +  false.
Fixed.

> 5) The commented use PostgresNode and use TestLib can be removed:
> +# Test of logical replication subscription self-disabling feature use
> +strict; use warnings; # use PostgresNode; # use TestLib; use
> +PostgreSQL::Test::Cluster; use PostgreSQL::Test::Utils; use Test::More
> +tests => 10;
Removed.


Also, my colleague Greg provided an offlist patch to me and
I've incorporated his suggested modifications into this version.
So, I noted his name as a coauthor.

C codes are checked by pgindent again.

This v5 depends on v23 skip xid in [1].

[1] - 
https://www.postgresql.org/message-id/CAD21AoA5jupM6O%3DpYsyfaxQ1aMX-en8%3DQNgpW6KfXsg7_CS0CQ%40mail.gmail.com


Best Regards,
Takamichi Osumi



v5-0001-Optionally-disable-subscriptions-on-error.patch
Description: v5-0001-Optionally-disable-subscriptions-on-error.patch


Re: Optionally automatically disable logical replication subscriptions on error

2021-11-11 Thread Greg Nancarrow
On Thu, Nov 11, 2021 at 8:20 PM osumi.takami...@fujitsu.com
 wrote:
>
> C codes are checked by pgindent.
>
> Note that this depends on the v20 skip xide patch in [1]
>

Some comments on the v4 patch:

(1) Patch subject
I think the patch subject should say "disable" instead of "disabling":
   Optionally disable subscriptions on error

doc/src/sgml/ref/create_subscription.sgml
(2) spelling mistake
+  if replicating data from the publisher triggers non-transicent errors

non-transicent -> non-transient

(I notice Vignesh also pointed this out)

src/backend/replication/logical/worker.c
(3) calling geterrcode()
The new IsSubscriptionDisablingError() function calls geterrcode().
According to the function comment for geterrcode(), it is only
intended for use in error callbacks.
Instead of calling geterrcode(), couldn't the ErrorData from PG_CATCH
block be passed to IsSubscriptionDisablingError() instead (from which
it can get the sqlerrcode)?

(4) DisableSubscriptionOnError
DisableSubscriptionOnError() is again calling MemoryContextSwitch()
and CopyErrorData().
I think the ErrorData from the PG_CATCH block could simply be passed
to DisableSubscriptionOnError() instead of the memory-context, and the
existing MemoryContextSwitch() and CopyErrorData() calls could be
removed from it.

AFAICS, applying (3) and (4) above would make the code a lot cleaner.


Regards,
Greg Nancarrow
Fujitsu Australia




Re: Optionally automatically disable logical replication subscriptions on error

2021-11-11 Thread vignesh C
On Thu, Nov 11, 2021 at 2:50 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Wednesday, November 10, 2021 1:23 PM Greg Nancarrow  
> wrote:
> > On Wed, Nov 10, 2021 at 12:26 PM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > On Monday, November 8, 2021 10:15 PM vignesh C 
> > wrote:
> > > > Thanks for the updated patch. Please create a Commitfest entry for
> > > > this. It will help to have a look at CFBot results for the patch,
> > > > also if required rebase and post a patch on top of Head.
> > > As requested, created a new entry for this - [1]
> > >
> > > FYI: the skip xid patch has been updated to v20 in [2] but the v3 for
> > > disable_on_error is not affected by this update and still applicable
> > > with no regression.
> > >
> > > [1] - https://commitfest.postgresql.org/36/3407/
> > > [2] -
> > >
> > https://www.postgresql.org/message-id/CAD21AoAT42mhcqeB1jPfRL1%2B
> > EUHbZ
> > > k8MMY_fBgsyZvJeKNpG%2Bw%40mail.gmail.com
> >
> > I had a look at this patch and have a couple of initial review comments for 
> > some
> > issues I spotted:
> Thank you for checking it.
>
>
> > src/backend/commands/subscriptioncmds.c
> > (1) bad array entry assignment
> > The following code block added by the patch assigns
> > "values[Anum_pg_subscription_subdisableonerr - 1]" twice, resulting in it
> > being always set to true, rather than the specified option value:
> >
> > +  if (IsSet(opts.specified_opts, SUBOPT_DISABLE_ON_ERR))  {
> > +values[Anum_pg_subscription_subdisableonerr - 1]
> > +   = BoolGetDatum(opts.disableonerr);
> > + values[Anum_pg_subscription_subdisableonerr - 1]
> > +   = true;
> > +  }
> >
> > The 2nd line is meant to instead be
> > "replaces[Anum_pg_subscription_subdisableonerr - 1] = true".
> > (compare to handling for other similar options)
> Oops, fixed.
>
> > src/backend/replication/logical/worker.c
> > (2) unreachable code?
> > In the patch code there seems to be some instances of unreachable code after
> > re-throwing errors:
> >
> > e.g.
> >
> > + /* If we caught an error above, disable the subscription */ if
> > + (disable_subscription) {
> > +   ReThrowError(DisableSubscriptionOnError(cctx));
> > +   MemoryContextSwitchTo(ecxt);
> > + }
> >
> > + else
> > + {
> > +   PG_RE_THROW();
> > +   MemoryContextSwitchTo(ecxt);
> > + }
> >
> >
> > + if (disable_subscription)
> > + {
> > +   ReThrowError(DisableSubscriptionOnError(cctx));
> > +   MemoryContextSwitchTo(ecxt);
> > + }
> >
> > I'm guessing it was intended to do the "MemoryContextSwitch(ecxt);"
> > before re-throwing (?), but it's not really clear, as in the 1st and 3rd 
> > cases, the
> > DisableSubscriptionOnError() calls anyway immediately switch the memory
> > context to cctx.
> You are right I think.
> Fixed based on an idea below.
>
> After an error happens, for some additional work
> (e.g. to report the stats of table sync/apply worker
> by pgstat_report_subworker_error() or
> to update the catalog by DisableSubscriptionOnError())
> restore the memory context that is used before the error (cctx)
> and save the old memory context of error (ecxt). Then,
> do the additional work and switch the memory context to the ecxt
> just before the rethrow. As you described,
> in contrast to PG_RE_THROW, DisableSubscriptionOnError() changes
> the memory context immediatedly at the top of it,
> so for this case, I don't call the MemoryContextSwitchTo().
>
> Another important thing as my modification
> is a case when LogicalRepApplyLoop failed and
> apply_error_callback_arg.command == 0. In the original
> patch of skip xid, it just calls PG_RE_THROW()
> but my previous v3 codes missed this macro in this case.
> Therefore, I've fixed this part as well.
>
> C codes are checked by pgindent.
>
> Note that this depends on the v20 skip xide patch in [1]
>

Thanks for the updated patch, Few comments:
1)  tab completion should be added for disable_on_error:
/* Complete "CREATE SUBSCRIPTION  ...  WITH ( " */
else if (HeadMatches("CREATE", "SUBSCRIPTION") && TailMatches("WITH", "("))
COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
  "enabled", "slot_name", "streaming",
  "synchronous_commit", "two_phase");

2) disable_on_error is supported by alter subscription, the same
should be documented:
@ -871,11 +886,19 @@ AlterSubscription(ParseState *pstate,
AlterSubscriptionStmt *stmt,
{
supported_opts = (SUBOPT_SLOT_NAME |

SUBOPT_SYNCHRONOUS_COMMIT | SUBOPT_BINARY |
-
SUBOPT_STREAMING);
+
SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR);

parse_subscription_options(pstate,
stmt->options,

supported_opts, );

+   if (IsSet(opts.specified_opts,
SUBOPT_DISABLE_ON_ERR))
+   {
+
values[Anum_pg_subscription_subdisableonerr - 1]
+   =
BoolGetDatum(opts.disableonerr);
+
replaces[Anum_pg_subscription_subdisableonerr - 1]
+  

RE: Optionally automatically disable logical replication subscriptions on error

2021-11-11 Thread osumi.takami...@fujitsu.com
On Wednesday, November 10, 2021 1:23 PM Greg Nancarrow  
wrote:
> On Wed, Nov 10, 2021 at 12:26 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > On Monday, November 8, 2021 10:15 PM vignesh C 
> wrote:
> > > Thanks for the updated patch. Please create a Commitfest entry for
> > > this. It will help to have a look at CFBot results for the patch,
> > > also if required rebase and post a patch on top of Head.
> > As requested, created a new entry for this - [1]
> >
> > FYI: the skip xid patch has been updated to v20 in [2] but the v3 for
> > disable_on_error is not affected by this update and still applicable
> > with no regression.
> >
> > [1] - https://commitfest.postgresql.org/36/3407/
> > [2] -
> >
> https://www.postgresql.org/message-id/CAD21AoAT42mhcqeB1jPfRL1%2B
> EUHbZ
> > k8MMY_fBgsyZvJeKNpG%2Bw%40mail.gmail.com
> 
> I had a look at this patch and have a couple of initial review comments for 
> some
> issues I spotted:
Thank you for checking it.


> src/backend/commands/subscriptioncmds.c
> (1) bad array entry assignment
> The following code block added by the patch assigns
> "values[Anum_pg_subscription_subdisableonerr - 1]" twice, resulting in it
> being always set to true, rather than the specified option value:
> 
> +  if (IsSet(opts.specified_opts, SUBOPT_DISABLE_ON_ERR))  {
> +values[Anum_pg_subscription_subdisableonerr - 1]
> +   = BoolGetDatum(opts.disableonerr);
> + values[Anum_pg_subscription_subdisableonerr - 1]
> +   = true;
> +  }
> 
> The 2nd line is meant to instead be
> "replaces[Anum_pg_subscription_subdisableonerr - 1] = true".
> (compare to handling for other similar options)
Oops, fixed.
 
> src/backend/replication/logical/worker.c
> (2) unreachable code?
> In the patch code there seems to be some instances of unreachable code after
> re-throwing errors:
> 
> e.g.
> 
> + /* If we caught an error above, disable the subscription */ if
> + (disable_subscription) {
> +   ReThrowError(DisableSubscriptionOnError(cctx));
> +   MemoryContextSwitchTo(ecxt);
> + }
> 
> + else
> + {
> +   PG_RE_THROW();
> +   MemoryContextSwitchTo(ecxt);
> + }
> 
> 
> + if (disable_subscription)
> + {
> +   ReThrowError(DisableSubscriptionOnError(cctx));
> +   MemoryContextSwitchTo(ecxt);
> + }
> 
> I'm guessing it was intended to do the "MemoryContextSwitch(ecxt);"
> before re-throwing (?), but it's not really clear, as in the 1st and 3rd 
> cases, the
> DisableSubscriptionOnError() calls anyway immediately switch the memory
> context to cctx.
You are right I think.
Fixed based on an idea below.

After an error happens, for some additional work
(e.g. to report the stats of table sync/apply worker
by pgstat_report_subworker_error() or
to update the catalog by DisableSubscriptionOnError())
restore the memory context that is used before the error (cctx)
and save the old memory context of error (ecxt). Then,
do the additional work and switch the memory context to the ecxt
just before the rethrow. As you described, 
in contrast to PG_RE_THROW, DisableSubscriptionOnError() changes
the memory context immediatedly at the top of it,
so for this case, I don't call the MemoryContextSwitchTo().

Another important thing as my modification
is a case when LogicalRepApplyLoop failed and
apply_error_callback_arg.command == 0. In the original
patch of skip xid, it just calls PG_RE_THROW()
but my previous v3 codes missed this macro in this case.
Therefore, I've fixed this part as well.

C codes are checked by pgindent.

Note that this depends on the v20 skip xide patch in [1]

[1] - 
https://www.postgresql.org/message-id/CAD21AoAT42mhcqeB1jPfRL1%2BEUHbZk8MMY_fBgsyZvJeKNpG%2Bw%40mail.gmail.com

Best Regards,
Takamichi Osumi



v4-0001-Optionally-disabling-subscriptions-on-error.patch
Description: v4-0001-Optionally-disabling-subscriptions-on-error.patch


Re: Optionally automatically disable logical replication subscriptions on error

2021-11-09 Thread Greg Nancarrow
On Wed, Nov 10, 2021 at 3:22 PM Greg Nancarrow  wrote:
>
> I had a look at this patch and have a couple of initial review
> comments for some issues I spotted:
>

Incidentally, I found that the v3 patch only applies after the skip xid v20
patch [1] has been applied.

[2] -
https://www.postgresql.org/message-id/CAD21AoAT42mhcqeB1jPfRL1%2BEUHbZk8MMY_fBgsyZvJeKNpG%2Bw%40mail.gmail.com


Regards,
Greg Nancarrow
Fujitsu Australia


Re: Optionally automatically disable logical replication subscriptions on error

2021-11-09 Thread Greg Nancarrow
On Wed, Nov 10, 2021 at 12:26 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Monday, November 8, 2021 10:15 PM vignesh C  wrote:
> > Thanks for the updated patch. Please create a Commitfest entry for this. It 
> > will
> > help to have a look at CFBot results for the patch, also if required rebase 
> > and
> > post a patch on top of Head.
> As requested, created a new entry for this - [1]
>
> FYI: the skip xid patch has been updated to v20 in [2]
> but the v3 for disable_on_error is not affected by this update
> and still applicable with no regression.
>
> [1] - https://commitfest.postgresql.org/36/3407/
> [2] - 
> https://www.postgresql.org/message-id/CAD21AoAT42mhcqeB1jPfRL1%2BEUHbZk8MMY_fBgsyZvJeKNpG%2Bw%40mail.gmail.com
>

I had a look at this patch and have a couple of initial review
comments for some issues I spotted:

src/backend/commands/subscriptioncmds.c
(1) bad array entry assignment
The following code block added by the patch assigns
"values[Anum_pg_subscription_subdisableonerr - 1]" twice, resulting in
it being always set to true, rather than the specified option value:

+  if (IsSet(opts.specified_opts, SUBOPT_DISABLE_ON_ERR))
+  {
+values[Anum_pg_subscription_subdisableonerr - 1]
+   = BoolGetDatum(opts.disableonerr);
+ values[Anum_pg_subscription_subdisableonerr - 1]
+   = true;
+  }

The 2nd line is meant to instead be
"replaces[Anum_pg_subscription_subdisableonerr - 1] = true".
(compare to handling for other similar options)

src/backend/replication/logical/worker.c
(2) unreachable code?
In the patch code there seems to be some instances of unreachable code
after re-throwing errors:

e.g.

+ /* If we caught an error above, disable the subscription */
+ if (disable_subscription)
+ {
+   ReThrowError(DisableSubscriptionOnError(cctx));
+   MemoryContextSwitchTo(ecxt);
+ }

+ else
+ {
+   PG_RE_THROW();
+   MemoryContextSwitchTo(ecxt);
+ }


+ if (disable_subscription)
+ {
+   ReThrowError(DisableSubscriptionOnError(cctx));
+   MemoryContextSwitchTo(ecxt);
+ }

I'm guessing it was intended to do the "MemoryContextSwitch(ecxt);"
before re-throwing (?), but it's not really clear, as in the 1st and
3rd cases, the DisableSubscriptionOnError() calls anyway immediately
switch the memory context to cctx.


Regards,
Greg Nancarrow
Fujitsu Australia




  1   2   >