Re: Minimal logical decoding on standbys
On Tue, Apr 11, 2023 at 01:10:57PM -0700, Andres Freund wrote: > On 2023-04-11 11:04:50 +0200, Drouvot, Bertrand wrote: > > On 4/11/23 10:55 AM, Drouvot, Bertrand wrote: > > > I think we might want to add: > > > > > > $node_primary->wait_for_replay_catchup($node_standby); > > > > > > before calling the slot creation. > Pushed. Seems like a clear race in the test, so I didn't think it was worth > waiting for testing it on hoverfly. We'll see what happens in the next run. > I think we should lower the log level, but perhaps wait for a few more cycles > in case there are random failures? Fine with me. > I wonder if we should make the connections in poll_query_until to reduce > verbosity - it's pretty annoying how much that can bloat the log. Perhaps also > introduce some backoff? It's really annoying to have to trawl through all > those logs when there's a problem. Agreed. My ranked wish list for poll_query_until is: 1. Exponential backoff 2. Closed-loop time control via Time::HiRes or similar, instead of assuming that ten loops complete in ~1s. I've seen the loop take 3x as long as the intended timeout. 3. Connect less often than today's once per probe
Re: Minimal logical decoding on standbys
Andres Freund writes: > I think we should lower the log level, but perhaps wait for a few more cycles > in case there are random failures? Removing -log_min_messages = 'debug2' -log_error_verbosity = verbose not only reduces 035's log output volume from 1.6MB to 260kB, but also speeds it up nontrivially: on my machine it takes about 8.50 sec as of HEAD, but 8.09 sec after silencing the extra logging. So I definitely want to see that out of there. regards, tom lane
Re: Minimal logical decoding on standbys
Hi, On 2023-04-11 11:04:50 +0200, Drouvot, Bertrand wrote: > On 4/11/23 10:55 AM, Drouvot, Bertrand wrote: > > Hi, > > > > I think we might want to add: > > > > $node_primary->wait_for_replay_catchup($node_standby); > > > > before calling the slot creation. > > > > It's done in the attached, would it be possible to give it a try please? > > Actually, let's also wait for the cascading standby to catchup too, like done > in the attached. Pushed. Seems like a clear race in the test, so I didn't think it was worth waiting for testing it on hoverfly. I think we should lower the log level, but perhaps wait for a few more cycles in case there are random failures? I wonder if we should make the connections in poll_query_until to reduce verbosity - it's pretty annoying how much that can bloat the log. Perhaps also introduce some backoff? It's really annoying to have to trawl through all those logs when there's a problem. Greetings, Andres Freund
Re: Minimal logical decoding on standbys
Hi, On 4/11/23 10:55 AM, Drouvot, Bertrand wrote: Hi, I think we might want to add: $node_primary->wait_for_replay_catchup($node_standby); before calling the slot creation. It's done in the attached, would it be possible to give it a try please? Actually, let's also wait for the cascading standby to catchup too, like done in the attached. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl index ba98a18bd2..94a8384c31 100644 --- a/src/test/recovery/t/035_standby_logical_decoding.pl +++ b/src/test/recovery/t/035_standby_logical_decoding.pl @@ -653,9 +653,15 @@ $node_standby->reload; $node_primary->psql('postgres', q[CREATE DATABASE testdb]); $node_primary->safe_psql('testdb', qq[CREATE TABLE decoding_test(x integer, y text);]); +# Wait for the standby to catchup before creating the slots +$node_primary->wait_for_replay_catchup($node_standby); + # create the logical slots create_logical_slots($node_standby, 'promotion_'); +# Wait for the cascading standby to catchup before creating the slots +$node_standby->wait_for_replay_catchup($node_cascading_standby, $node_primary); + # create the logical slots on the cascading standby too create_logical_slots($node_cascading_standby, 'promotion_');
Re: Minimal logical decoding on standbys
Hi, On 4/11/23 10:20 AM, Drouvot, Bertrand wrote: Hi, On 4/11/23 7:36 AM, Noah Misch wrote: On Fri, Apr 07, 2023 at 11:12:26AM -0700, Andres Freund wrote: --- /dev/null +++ b/src/test/recovery/t/035_standby_logical_decoding.pl @@ -0,0 +1,720 @@ +# logical decoding on standby : test logical decoding, +# recovery conflict and standby promotion. ... +$node_primary->append_conf('postgresql.conf', q{ +wal_level = 'logical' +max_replication_slots = 4 +max_wal_senders = 4 +log_min_messages = 'debug2' +log_error_verbosity = verbose +}); Buildfarm member hoverfly stopped reporting in when this test joined the tree. It's currently been stuck here for 140 minutes: Thanks for the report! It's looping on: 2023-04-11 02:57:52.516 UTC [62718288:5] 035_standby_logical_decoding.pl LOG: 0: statement: SELECT restart_lsn IS NOT NULL FROM pg_catalog.pg_replication_slots WHERE slot_name = 'promotion_inactiveslot' And the reason is that the slot is not being created: $ grep "CREATE_REPLICATION_SLOT" 035_standby_logical_decoding_standby.log | tail -2 2023-04-11 02:57:47.287 UTC [9241178:15] 035_standby_logical_decoding.pl STATEMENT: CREATE_REPLICATION_SLOT "otherslot" LOGICAL "test_decoding" ( SNAPSHOT 'nothing') 2023-04-11 02:57:47.622 UTC [9241178:23] 035_standby_logical_decoding.pl STATEMENT: CREATE_REPLICATION_SLOT "otherslot" LOGICAL "test_decoding" ( SNAPSHOT 'nothing') Not sure why the slot is not being created. There is also "replication apply delay" increasing: 2023-04-11 02:57:49.183 UTC [13304488:253] DEBUG: 0: sendtime 2023-04-11 02:57:49.111363+00 receipttime 2023-04-11 02:57:49.183512+00 replication apply delay 644 ms transfer latency 73 ms 2023-04-11 02:57:49.184 UTC [13304488:259] DEBUG: 0: sendtime 2023-04-11 02:57:49.183461+00 receipttime 2023-04-11 02:57:49.1842+00 replication apply delay 645 ms transfer latency 1 ms 2023-04-11 02:57:49.221 UTC [13304488:265] DEBUG: 0: sendtime 2023-04-11 02:57:49.184166+00 receipttime 2023-04-11 02:57:49.221059+00 replication apply delay 682 ms transfer latency 37 ms 2023-04-11 02:57:49.222 UTC [13304488:271] DEBUG: 0: sendtime 2023-04-11 02:57:49.221003+00 receipttime 2023-04-11 02:57:49.222144+00 replication apply delay 683 ms transfer latency 2 ms 2023-04-11 02:57:49.222 UTC [13304488:277] DEBUG: 0: sendtime 2023-04-11 02:57:49.222095+00 receipttime 2023-04-11 02:57:49.2228+00 replication apply delay 684 ms transfer latency 1 ms Noah, I think hoverfly is yours, would it be possible to have access (I'm not an AIX expert though) or check if you see a slot creation hanging and if so why? Well, we can see in 035_standby_logical_decoding_standby.log: 2023-04-11 02:57:49.180 UTC [62718258:5] [unknown] FATAL: 3D000: database "testdb" does not exist While, on the primary: 2023-04-11 02:57:48.505 UTC [62718254:5] 035_standby_logical_decoding.pl LOG: 0: statement: CREATE DATABASE testdb The TAP test is doing: " ## # Test standby promotion and logical decoding behavior # after the standby gets promoted. ## $node_standby->reload; $node_primary->psql('postgres', q[CREATE DATABASE testdb]); $node_primary->safe_psql('testdb', qq[CREATE TABLE decoding_test(x integer, y text);]); # create the logical slots create_logical_slots($node_standby, 'promotion_'); " I think we might want to add: $node_primary->wait_for_replay_catchup($node_standby); before calling the slot creation. It's done in the attached, would it be possible to give it a try please? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl index ba98a18bd2..ad845aee28 100644 --- a/src/test/recovery/t/035_standby_logical_decoding.pl +++ b/src/test/recovery/t/035_standby_logical_decoding.pl @@ -653,6 +653,9 @@ $node_standby->reload; $node_primary->psql('postgres', q[CREATE DATABASE testdb]); $node_primary->safe_psql('testdb', qq[CREATE TABLE decoding_test(x integer, y text);]); +# Wait for the standby to catchup before creating the slots +$node_primary->wait_for_replay_catchup($node_standby); + # create the logical slots create_logical_slots($node_standby, 'promotion_');
Re: Minimal logical decoding on standbys
Hi, On 4/11/23 7:36 AM, Noah Misch wrote: On Fri, Apr 07, 2023 at 11:12:26AM -0700, Andres Freund wrote: --- /dev/null +++ b/src/test/recovery/t/035_standby_logical_decoding.pl @@ -0,0 +1,720 @@ +# logical decoding on standby : test logical decoding, +# recovery conflict and standby promotion. ... +$node_primary->append_conf('postgresql.conf', q{ +wal_level = 'logical' +max_replication_slots = 4 +max_wal_senders = 4 +log_min_messages = 'debug2' +log_error_verbosity = verbose +}); Buildfarm member hoverfly stopped reporting in when this test joined the tree. It's currently been stuck here for 140 minutes: Thanks for the report! It's looping on: 2023-04-11 02:57:52.516 UTC [62718288:5] 035_standby_logical_decoding.pl LOG: 0: statement: SELECT restart_lsn IS NOT NULL FROM pg_catalog.pg_replication_slots WHERE slot_name = 'promotion_inactiveslot' And the reason is that the slot is not being created: $ grep "CREATE_REPLICATION_SLOT" 035_standby_logical_decoding_standby.log | tail -2 2023-04-11 02:57:47.287 UTC [9241178:15] 035_standby_logical_decoding.pl STATEMENT: CREATE_REPLICATION_SLOT "otherslot" LOGICAL "test_decoding" ( SNAPSHOT 'nothing') 2023-04-11 02:57:47.622 UTC [9241178:23] 035_standby_logical_decoding.pl STATEMENT: CREATE_REPLICATION_SLOT "otherslot" LOGICAL "test_decoding" ( SNAPSHOT 'nothing') Not sure why the slot is not being created. There is also "replication apply delay" increasing: 2023-04-11 02:57:49.183 UTC [13304488:253] DEBUG: 0: sendtime 2023-04-11 02:57:49.111363+00 receipttime 2023-04-11 02:57:49.183512+00 replication apply delay 644 ms transfer latency 73 ms 2023-04-11 02:57:49.184 UTC [13304488:259] DEBUG: 0: sendtime 2023-04-11 02:57:49.183461+00 receipttime 2023-04-11 02:57:49.1842+00 replication apply delay 645 ms transfer latency 1 ms 2023-04-11 02:57:49.221 UTC [13304488:265] DEBUG: 0: sendtime 2023-04-11 02:57:49.184166+00 receipttime 2023-04-11 02:57:49.221059+00 replication apply delay 682 ms transfer latency 37 ms 2023-04-11 02:57:49.222 UTC [13304488:271] DEBUG: 0: sendtime 2023-04-11 02:57:49.221003+00 receipttime 2023-04-11 02:57:49.222144+00 replication apply delay 683 ms transfer latency 2 ms 2023-04-11 02:57:49.222 UTC [13304488:277] DEBUG: 0: sendtime 2023-04-11 02:57:49.222095+00 receipttime 2023-04-11 02:57:49.2228+00 replication apply delay 684 ms transfer latency 1 ms Noah, I think hoverfly is yours, would it be possible to have access (I'm not an AIX expert though) or check if you see a slot creation hanging and if so why? === $ tail -n5 regress_log_035_standby_logical_decoding [02:57:48.390](0.100s) ok 66 - otherslot on standby not dropped ### Reloading node "standby" # Running: pg_ctl -D /scratch/nm/farm/xlc64v16/HEAD/pgsql.build/src/test/recovery/tmp_check/t_035_standby_logical_decoding_standby_data/pgdata reload server signaled === I've posted a tarball of the current logs at https://drive.google.com/file/d/1JIZ5hSHBsKjEgU5WOGHOqXB7Z_-9XT5u/view?usp=sharing. The test times out (PG_TEST_TIMEOUT_DEFAULT=5400), and uploading logs then fails with 413 Request Entity Too Large. Is the above log_min_messages='debug2' important? Removing that may make the logs small enough to upload normally. I think debug2 might still be useful while investigating this issue (I'll compare a working TAP run with this one). Regards -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
On Fri, Apr 07, 2023 at 11:12:26AM -0700, Andres Freund wrote: > --- /dev/null > +++ b/src/test/recovery/t/035_standby_logical_decoding.pl > @@ -0,0 +1,720 @@ > +# logical decoding on standby : test logical decoding, > +# recovery conflict and standby promotion. ... > +$node_primary->append_conf('postgresql.conf', q{ > +wal_level = 'logical' > +max_replication_slots = 4 > +max_wal_senders = 4 > +log_min_messages = 'debug2' > +log_error_verbosity = verbose > +}); Buildfarm member hoverfly stopped reporting in when this test joined the tree. It's currently been stuck here for 140 minutes: === $ tail -n5 regress_log_035_standby_logical_decoding [02:57:48.390](0.100s) ok 66 - otherslot on standby not dropped ### Reloading node "standby" # Running: pg_ctl -D /scratch/nm/farm/xlc64v16/HEAD/pgsql.build/src/test/recovery/tmp_check/t_035_standby_logical_decoding_standby_data/pgdata reload server signaled === I've posted a tarball of the current logs at https://drive.google.com/file/d/1JIZ5hSHBsKjEgU5WOGHOqXB7Z_-9XT5u/view?usp=sharing. The test times out (PG_TEST_TIMEOUT_DEFAULT=5400), and uploading logs then fails with 413 Request Entity Too Large. Is the above log_min_messages='debug2' important? Removing that may make the logs small enough to upload normally.
Re: Minimal logical decoding on standbys
On 4/8/23 5:27 AM, Andres Freund wrote: Hi, On 2023-04-07 14:27:09 -0700, Andres Freund wrote: I think I'll push these in a few hours. While this needed more changes than I'd like shortly before the freeze, I think they're largely not in very interesting bits and pieces - and this feature has been in the works for about three eternities, and it is blocking a bunch of highly requested features. If anybody still has energy, I would appreciate a look at 0001, 0002, the new pieces I added, to make what's now 0003 and 0004 cleaner. Pushed. Thanks all! I squashed some of the changes. There didn't seem to be a need for a separate stats commit, or a separate docs commit. Besides that, I did find plenty of grammar issues, and a bunch of formatting issues. Let's see what the buildfarm says. Thanks to everyone for working on this feature -- this should have a big impact on users of logical replication! While it still needs to get through the beta period etc. this is a big milestone for what's been a multi-year effort to support this. Thanks, Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: Minimal logical decoding on standbys
Hi, On 2023-04-07 14:27:09 -0700, Andres Freund wrote: > I think I'll push these in a few hours. While this needed more changes than > I'd like shortly before the freeze, I think they're largely not in very > interesting bits and pieces - and this feature has been in the works for about > three eternities, and it is blocking a bunch of highly requested features. > > If anybody still has energy, I would appreciate a look at 0001, 0002, the new > pieces I added, to make what's now 0003 and 0004 cleaner. Pushed. Thanks all! I squashed some of the changes. There didn't seem to be a need for a separate stats commit, or a separate docs commit. Besides that, I did find plenty of grammar issues, and a bunch of formatting issues. Let's see what the buildfarm says. Greetings, Andres Freund
Re: Minimal logical decoding on standbys
Hi, New wording works for me, thanks! Bertrand Le sam. 8 avr. 2023, 08:26, Andres Freund a écrit : > Hi, > > On 2023-04-07 11:12:26 -0700, Andres Freund wrote: > > + > > + > > + confl_active_logicalslot > bigint > > + > > + > > + Number of active logical slots in this database that have been > > + invalidated because they conflict with recovery (note that > inactive ones > > + are also invalidated but do not increment this counter) > > + > > + > > > > > > > > This seems wrong to me. The counter is not for invalidated slots, it's for > recovery conflict interrupts. If phrased that way, the parenthetical would > be > unnecessary. > > I think something like >Number of uses of logical slots in this database that have been >canceled due to old snapshots or a too low linkend="guc-wal-level"/> >on the primary > > would work and fit with the documentation of the other fields? Reads a bit > stilted, but so do several of the other fields... > > Greetings, > > Andres Freund >
Re: Minimal logical decoding on standbys
Hi, On 2023-04-07 11:12:26 -0700, Andres Freund wrote: > + > + > + confl_active_logicalslot > bigint > + > + > + Number of active logical slots in this database that have been > + invalidated because they conflict with recovery (note that inactive > ones > + are also invalidated but do not increment this counter) > + > + > > > This seems wrong to me. The counter is not for invalidated slots, it's for recovery conflict interrupts. If phrased that way, the parenthetical would be unnecessary. I think something like Number of uses of logical slots in this database that have been canceled due to old snapshots or a too low on the primary would work and fit with the documentation of the other fields? Reads a bit stilted, but so do several of the other fields... Greetings, Andres Freund
Re: Minimal logical decoding on standbys
On Sat, Apr 8, 2023 at 9:31 AM Andres Freund wrote: > > On 2023-04-08 09:15:05 +0530, Amit Kapila wrote: > > The new approach for invalidation looks clean. BTW, I see minor > > inconsistency in the following two error messages (errmsg): > > Thanks for checking. > > > > if (MyReplicationSlot->data.invalidated == RS_INVAL_WAL) > > ereport(ERROR, > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > errmsg("can no longer get changes from replication slot \"%s\"", > > NameStr(MyReplicationSlot->data.name)), > > errdetail("This slot has been invalidated because it exceeded the > > maximum reserved size."))); > > > > if (MyReplicationSlot->data.invalidated != RS_INVAL_NONE) > > ereport(ERROR, > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > errmsg("cannot read from logical replication slot \"%s\"", > > NameStr(MyReplicationSlot->data.name)), > > errdetail("This slot has been invalidated because it was conflicting > > with recovery."))); > > > > Won't it be better to keep the same errmsg in the above two cases? > > Probably - do you have a preference? I think the former is a bit better? > +1 for the former. -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
On 4/8/23 12:01 AM, Andres Freund wrote: Hi, On 2023-04-08 09:15:05 +0530, Amit Kapila wrote: The new approach for invalidation looks clean. BTW, I see minor inconsistency in the following two error messages (errmsg): Thanks for checking. if (MyReplicationSlot->data.invalidated == RS_INVAL_WAL) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("can no longer get changes from replication slot \"%s\"", NameStr(MyReplicationSlot->data.name)), errdetail("This slot has been invalidated because it exceeded the maximum reserved size."))); if (MyReplicationSlot->data.invalidated != RS_INVAL_NONE) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("cannot read from logical replication slot \"%s\"", NameStr(MyReplicationSlot->data.name)), errdetail("This slot has been invalidated because it was conflicting with recovery."))); Won't it be better to keep the same errmsg in the above two cases? Probably - do you have a preference? I think the former is a bit better? +1 for the former, though perhaps "receive" instead of "get?" Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: Minimal logical decoding on standbys
Hi, On 2023-04-08 09:15:05 +0530, Amit Kapila wrote: > The new approach for invalidation looks clean. BTW, I see minor > inconsistency in the following two error messages (errmsg): Thanks for checking. > if (MyReplicationSlot->data.invalidated == RS_INVAL_WAL) > ereport(ERROR, > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > errmsg("can no longer get changes from replication slot \"%s\"", > NameStr(MyReplicationSlot->data.name)), > errdetail("This slot has been invalidated because it exceeded the > maximum reserved size."))); > > if (MyReplicationSlot->data.invalidated != RS_INVAL_NONE) > ereport(ERROR, > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > errmsg("cannot read from logical replication slot \"%s\"", > NameStr(MyReplicationSlot->data.name)), > errdetail("This slot has been invalidated because it was conflicting > with recovery."))); > > Won't it be better to keep the same errmsg in the above two cases? Probably - do you have a preference? I think the former is a bit better? Greetings, Andres Freund
Re: Minimal logical decoding on standbys
On Fri, Apr 7, 2023 at 11:42 PM Andres Freund wrote: > > On 2023-04-07 08:47:57 -0700, Andres Freund wrote: > > Integrated all of these. > > Here's my current version. Changes: > - Integrated Bertrand's changes > - polished commit messages of 0001-0003 > - edited code comments for 0003, including > InvalidateObsoleteReplicationSlots()'s header > - added a bump of SLOT_VERSION to 0001 > - moved addition of pg_log_standby_snapshot() to 0007 > - added a catversion bump for pg_log_standby_snapshot() > - moved all the bits dealing with procsignals from 0003 to 0004, now the split > makes sense IMO > - combined a few more sucessive ->safe_psql() calls > The new approach for invalidation looks clean. BTW, I see minor inconsistency in the following two error messages (errmsg): if (MyReplicationSlot->data.invalidated == RS_INVAL_WAL) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("can no longer get changes from replication slot \"%s\"", NameStr(MyReplicationSlot->data.name)), errdetail("This slot has been invalidated because it exceeded the maximum reserved size."))); if (MyReplicationSlot->data.invalidated != RS_INVAL_NONE) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("cannot read from logical replication slot \"%s\"", NameStr(MyReplicationSlot->data.name)), errdetail("This slot has been invalidated because it was conflicting with recovery."))); Won't it be better to keep the same errmsg in the above two cases? -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
Hi, On 2023-04-07 18:32:04 -0400, Melanie Plageman wrote: > Code review only of 0001-0005. > > I noticed you had two 0008, btw. Yea, sorry for that. One was the older version. Just before sending the patch I saw another occurance of a test failure, which I then fixed. In the course of that I changed something in the patch subject. > On Fri, Apr 07, 2023 at 11:12:26AM -0700, Andres Freund wrote: > > Hi, > > > > On 2023-04-07 08:47:57 -0700, Andres Freund wrote: > > > Integrated all of these. > > > > From 0e038eb5dfddec500fbf4625775d1fa508a208f6 Mon Sep 17 00:00:00 2001 > > From: Andres Freund > > Date: Thu, 6 Apr 2023 20:00:07 -0700 > > Subject: [PATCH va67 1/9] Replace a replication slot's invalidated_at LSN > > with > > an enum > > > > diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h > > index 8872c80cdfe..ebcb637baed 100644 > > --- a/src/include/replication/slot.h > > +++ b/src/include/replication/slot.h > > @@ -37,6 +37,17 @@ typedef enum ReplicationSlotPersistency > > RS_TEMPORARY > > } ReplicationSlotPersistency; > > > > +/* > > + * Slots can be invalidated, e.g. due to max_slot_wal_keep_size. If so, the > > + * 'invalidated' field is set to a value other than _NONE. > > + */ > > +typedef enum ReplicationSlotInvalidationCause > > +{ > > + RS_INVAL_NONE, > > + /* required WAL has been removed */ > > I just wonder if RS_INVAL_WAL is too generic. Something like > RS_INVAL_WAL_MISSING or similar may be better since it seems there are > other inavlidation causes that may be related to WAL. Renamed to RS_INVAL_WAL_REMOVED > > From 52c25cc15abc4470d19e305d245b9362e6b8d6a3 Mon Sep 17 00:00:00 2001 > > From: Andres Freund > > Date: Fri, 7 Apr 2023 09:32:48 -0700 > > Subject: [PATCH va67 3/9] Support invalidating replication slots due to > > horizon and wal_level > > MIME-Version: 1.0 > > Content-Type: text/plain; charset=UTF-8 > > Content-Transfer-Encoding: 8bit > > > > Needed for supporting logical decoding on a standby. The new invalidation > > methods will be used in a subsequent commit. > > > > You probably are aware, but applying 0003 and 0004 both gives me two > warnings: > > warning: 1 line adds whitespace errors. > Warning: commit message did not conform to UTF-8. > You may want to amend it after fixing the message, or set the config > variable i18n.commitEncoding to the encoding your project uses. I did see the whitespace error, but not the encoding error. We have a bunch of other commit messages with Fabrizio's name "properly spelled" in, so I think that's ok. > > diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c > > index df23b7ed31e..c2a9accebf6 100644 > > --- a/src/backend/replication/slot.c > > +++ b/src/backend/replication/slot.c > > @@ -1241,8 +1241,58 @@ ReplicationSlotReserveWal(void) > > } > > > > /* > > - * Helper for InvalidateObsoleteReplicationSlots -- acquires the given slot > > - * and mark it invalid, if necessary and possible. > > + * Report that replication slot needs to be invalidated > > + */ > > +static void > > +ReportSlotInvalidation(ReplicationSlotInvalidationCause cause, > > + bool terminating, > > + int pid, > > + NameData slotname, > > + XLogRecPtr restart_lsn, > > + XLogRecPtr oldestLSN, > > + TransactionId > > snapshotConflictHorizon) > > +{ > > + StringInfoData err_detail; > > + boolhint = false; > > + > > + initStringInfo(&err_detail); > > + > > + switch (cause) > > + { > > + case RS_INVAL_WAL: > > + hint = true; > > + appendStringInfo(&err_detail, _("The slot's restart_lsn > > %X/%X exceeds the limit by %llu bytes."), > > + > > LSN_FORMAT_ARGS(restart_lsn), > > I'm not sure what the below cast is meant to do. If you are trying to > protect against overflow/underflow, I think you'd need to cast before > doing the subtraction. > > +(unsigned long long) > > (oldestLSN - restart_lsn)); > > + break; That's our current way of passing 64bit numbers to format string functions. It ends up as a 64bit number everywhere, even windows (with its stupid ILP32 model). > > + case RS_INVAL_HORIZON: > > + appendStringInfo(&err_detail, _("The slot conflicted > > with xid horizon %u."), > > + > > snapshotConflictHorizon); > > + break; > > + > > + case RS_INVAL_WAL_LEVEL: > > + appendStringInfo(&err_detail, _("Logical decoding on > > standby requires wal_level to be at least logical on the primary server")); > > + break; > > + ca
Re: Minimal logical decoding on standbys
Code review only of 0001-0005. I noticed you had two 0008, btw. On Fri, Apr 07, 2023 at 11:12:26AM -0700, Andres Freund wrote: > Hi, > > On 2023-04-07 08:47:57 -0700, Andres Freund wrote: > > Integrated all of these. > > From 0e038eb5dfddec500fbf4625775d1fa508a208f6 Mon Sep 17 00:00:00 2001 > From: Andres Freund > Date: Thu, 6 Apr 2023 20:00:07 -0700 > Subject: [PATCH va67 1/9] Replace a replication slot's invalidated_at LSN with > an enum > > diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h > index 8872c80cdfe..ebcb637baed 100644 > --- a/src/include/replication/slot.h > +++ b/src/include/replication/slot.h > @@ -37,6 +37,17 @@ typedef enum ReplicationSlotPersistency > RS_TEMPORARY > } ReplicationSlotPersistency; > > +/* > + * Slots can be invalidated, e.g. due to max_slot_wal_keep_size. If so, the > + * 'invalidated' field is set to a value other than _NONE. > + */ > +typedef enum ReplicationSlotInvalidationCause > +{ > + RS_INVAL_NONE, > + /* required WAL has been removed */ I just wonder if RS_INVAL_WAL is too generic. Something like RS_INVAL_WAL_MISSING or similar may be better since it seems there are other inavlidation causes that may be related to WAL. > + RS_INVAL_WAL, > +} ReplicationSlotInvalidationCause; > + 0002 LGTM > From 52c25cc15abc4470d19e305d245b9362e6b8d6a3 Mon Sep 17 00:00:00 2001 > From: Andres Freund > Date: Fri, 7 Apr 2023 09:32:48 -0700 > Subject: [PATCH va67 3/9] Support invalidating replication slots due to > horizon and wal_level > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > Needed for supporting logical decoding on a standby. The new invalidation > methods will be used in a subsequent commit. > You probably are aware, but applying 0003 and 0004 both gives me two warnings: warning: 1 line adds whitespace errors. Warning: commit message did not conform to UTF-8. You may want to amend it after fixing the message, or set the config variable i18n.commitEncoding to the encoding your project uses. > diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c > index df23b7ed31e..c2a9accebf6 100644 > --- a/src/backend/replication/slot.c > +++ b/src/backend/replication/slot.c > @@ -1241,8 +1241,58 @@ ReplicationSlotReserveWal(void) > } > > /* > - * Helper for InvalidateObsoleteReplicationSlots -- acquires the given slot > - * and mark it invalid, if necessary and possible. > + * Report that replication slot needs to be invalidated > + */ > +static void > +ReportSlotInvalidation(ReplicationSlotInvalidationCause cause, > +bool terminating, > +int pid, > +NameData slotname, > +XLogRecPtr restart_lsn, > +XLogRecPtr oldestLSN, > +TransactionId > snapshotConflictHorizon) > +{ > + StringInfoData err_detail; > + boolhint = false; > + > + initStringInfo(&err_detail); > + > + switch (cause) > + { > + case RS_INVAL_WAL: > + hint = true; > + appendStringInfo(&err_detail, _("The slot's restart_lsn > %X/%X exceeds the limit by %llu bytes."), > + > LSN_FORMAT_ARGS(restart_lsn), I'm not sure what the below cast is meant to do. If you are trying to protect against overflow/underflow, I think you'd need to cast before doing the subtraction. > + (unsigned long long) > (oldestLSN - restart_lsn)); > + break; > + case RS_INVAL_HORIZON: > + appendStringInfo(&err_detail, _("The slot conflicted > with xid horizon %u."), > + > snapshotConflictHorizon); > + break; > + > + case RS_INVAL_WAL_LEVEL: > + appendStringInfo(&err_detail, _("Logical decoding on > standby requires wal_level to be at least logical on the primary server")); > + break; > + case RS_INVAL_NONE: > + pg_unreachable(); > + } This ereport is quite hard to read. Is there any simplification you can do of the ternaries without undue duplication? > + ereport(LOG, > + terminating ? > + errmsg("terminating process %d to release replication > slot \"%s\"", > +pid, NameStr(slotname)) : > + errmsg("invalidating obsolete replication slot \"%s\"", > +NameStr(slotname)), > + errdetail_internal("%s", err_detail.data), > + hint ? errhint("You might need to increase > max_slot_wal_keep_size.") : 0); > + > + pfree
Re: Minimal logical decoding on standbys
I gave a very quick look at 0001 and 0003. I find no fault with 0001. It was clear back when we added that stuff that invalidated_at was not terribly useful -- I was just too conservative to not have it -- but now that a lot of time has passed and we haven't done anything with it, removing it seems perfectly OK. As for 0003, I have no further concerns about the translatability. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "El miedo atento y previsor es la madre de la seguridad" (E. Burke)
Re: Minimal logical decoding on standbys
Hi, On 2023-04-07 22:54:01 +0200, Drouvot, Bertrand wrote: > That looks good to me Cool. I think I'll push these in a few hours. While this needed more changes than I'd like shortly before the freeze, I think they're largely not in very interesting bits and pieces - and this feature has been in the works for about three eternities, and it is blocking a bunch of highly requested features. If anybody still has energy, I would appreciate a look at 0001, 0002, the new pieces I added, to make what's now 0003 and 0004 cleaner. > 0005 is missing author/reviewer, I'd propose: > [...] Thanks, I'll integrate them... > It's hard (given the amount of emails that have been send during all this > time), Indeed. Greetings, Andres Freund
Re: Minimal logical decoding on standbys
Hi, On 4/7/23 8:24 PM, Drouvot, Bertrand wrote: Hi, On 4/7/23 5:47 PM, Andres Freund wrote: Hi, - write a test that invalidated logical slots do not lead to retaining WAL I'm not sure how to do that since pg_switch_wal() and friends can't be executed on a standby. You can do it on the primary and wait for the records to have been applied. Thanks, will give it a try in a couple of hours. I looked at it but I think we'd also need things like pg_walfile_name() on the standby but is not allowed. Is this patchset sufficient to subscribe to a publication on a physical standby, assuming the publication is created on the primary? If so, we should have at least a minimal test. If not, we should note that restriction explicitly. I gave it a try and it does work. " node3 subscribes to node2 (standby). Insert done in node1 (primary) where the publication is created => node3 see the changes. " I started to create the TAP test but currently stuck as the "create subscription" waits for a checkpoint/pg_log_standby_snapshot() on the primary. So, trying to make use of things like: "my %psql_subscriber = ('stdin' => '', 'stdout' => ''); $psql_subscriber{run} = $node_subscriber->background_psql('postgres', \$psql_subscriber{stdin}, \$psql_subscriber{stdout}, $psql_timeout); $psql_subscriber{stdout} = ''; " But in vain so far... please find attached sub_in_progress.patch that "should work" but "does not" because the wait_for_subscription_sync() call produces: " error running SQL: 'psql::1: ERROR: recovery is in progress HINT: WAL control functions cannot be executed during recovery.' while running 'psql -XAtq -d port=61441 host=/tmp/45dt3wqs2p dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'SELECT pg_current_wal_lsn()' " Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl index 561dcd33c3..c3c0e718c8 100644 --- a/src/test/recovery/t/035_standby_logical_decoding.pl +++ b/src/test/recovery/t/035_standby_logical_decoding.pl @@ -8,14 +8,18 @@ use PostgreSQL::Test::Cluster; use PostgreSQL::Test::Utils; use Test::More; -my ($stdin, $stdout, $stderr, $cascading_stdout, $cascading_stderr, $ret, $handle, $slot); +my ($stdin, $stdout, $stderr, $cascading_stdout, $cascading_stderr, $subscriber_stdin, $subscriber_stdout, $subscriber_stderr, $ret, $handle, $slot); my $node_primary = PostgreSQL::Test::Cluster->new('primary'); my $node_standby = PostgreSQL::Test::Cluster->new('standby'); my $node_cascading_standby = PostgreSQL::Test::Cluster->new('cascading_standby'); +my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber'); my $default_timeout = $PostgreSQL::Test::Utils::timeout_default; +my $psql_timeout = + IPC::Run::timer(2 * $PostgreSQL::Test::Utils::timeout_default); my $res; + # Name for the physical slot on primary my $primary_slotname = 'primary_physical'; my $standby_physical_slotname = 'standby_physical'; @@ -263,6 +267,7 @@ $node_standby->init_from_backup( has_restoring => 1); $node_standby->append_conf('postgresql.conf', qq[primary_slot_name = '$primary_slotname']); +$node_standby->append_conf('postgresql.conf', 'max_replication_slots = 6'); $node_standby->start; $node_primary->wait_for_replay_catchup($node_standby); $node_standby->safe_psql('testdb', qq[SELECT * FROM pg_create_physical_replication_slot('$standby_physical_slotname');]); @@ -280,6 +285,20 @@ $node_cascading_standby->append_conf('postgresql.conf', $node_cascading_standby->start; $node_standby->wait_for_replay_catchup($node_cascading_standby, $node_primary); +### +# Initialize subscriber node +### +$node_subscriber->init(allows_streaming => 'logical'); +$node_subscriber->append_conf('postgresql.conf', 'max_replication_slots = 4'); +$node_subscriber->start; + +my %psql_subscriber = ('subscriber_stdin' => '', 'subscriber_stdout' => ''); +$psql_subscriber{run} = + $node_subscriber->background_psql('postgres', \$psql_subscriber{subscriber_stdin}, +\$psql_subscriber{subscriber_stdout}, +$psql_timeout); +$psql_subscriber{subscriber_stdout} = ''; + ## # Test that logical decoding on the standby # behaves correctly. @@ -360,6 +379,43 @@ is( $node_primary->psql( 3, 'replaying logical slot from another database fails'); +## +# Test that we can subscribe on the standby with the publication +# created on the primary. +## + +# Create a table on the primary +$node_primary->safe_psql('postgres', "CREATE TABLE tab_rep (a int primary key)"); + +# Create a table (same structure) on the subscriber node +$node_subscriber->safe_psql('postgres', "CREATE TABLE tab_rep (a int pri
Re: Minimal logical decoding on standbys
Hi, On 4/7/23 8:27 PM, Drouvot, Bertrand wrote: Hi, I think some of the patches might have more reviewers than really applicable, and might also miss some. I'd appreciate if you could go over that... Sure, will do in a couple of hours. That looks good to me, just few remarks: 0005 is missing author/reviewer, I'd propose: Author: "Drouvot, Bertrand" Author: Andres Freund Reviewed-by: Andres Freund Reviewed-by: Robert Haas Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/20230407075009.igg7be27ha2ht...@awork3.anarazel.de 0006, I'd propose: Author: "Drouvot, Bertrand" Reviewed-By: Jeff Davis Reviewed-By: Robert Haas Reviewed-by: Amit Kapila Reviewed-by: Masahiko Sawada 0007, I'd propose: Author: "Drouvot, Bertrand" Author: Andres Freund Author: Amit Khandekar (in an older version) Reviewed-by: FabrÌzio de Royes Mello Reviewed-by: Amit Kapila Reviewed-By: Robert Haas 0009, I'd propose: Author: "Drouvot, Bertrand" Author: Andres Freund Author: Amit Khandekar (in an older version) Reviewed-by: FabrÌzio de Royes Mello Reviewed-by: Amit Kapila Reviewed-By: Robert Haas It's hard (given the amount of emails that have been send during all this time), but I do hope it's correct and that nobody is missing. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
Hi, On 4/7/23 8:12 PM, Andres Freund wrote: Hi, On 2023-04-07 08:47:57 -0700, Andres Freund wrote: Integrated all of these. Here's my current version. Changes: - Integrated Bertrand's changes - polished commit messages of 0001-0003 - edited code comments for 0003, including InvalidateObsoleteReplicationSlots()'s header - added a bump of SLOT_VERSION to 0001 - moved addition of pg_log_standby_snapshot() to 0007 - added a catversion bump for pg_log_standby_snapshot() - moved all the bits dealing with procsignals from 0003 to 0004, now the split makes sense IMO - combined a few more sucessive ->safe_psql() calls Thanks! I see occasional failures in the tests, particularly in the new test using pg_authid, but not solely. cfbot also seems to have seen these: https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3740 I made a bogus attempt at a workaround for the pg_authid case last night. But that didn't actually fix anything, it just changed the timing. I think the issue is that VACUUM does not force WAL to be flushed at the end (since it does not assign an xid). wait_for_replay_catchup() uses $node->lsn('flush'), which, due to VACUUM not flushing, can be an LSN from before VACUUM completed. The problem can be made more likely by adding pg_usleep(100); before walwriter.c's call to XLogBackgroundFlush(). We probably should introduce some infrastructure in Cluster.pm for this, but for now I just added a 'flush_wal' table that we insert into after a VACUUM. That guarantees a WAL flush. Ack for the Cluster.pm "improvement" and thanks for the "workaround"! I think some of the patches might have more reviewers than really applicable, and might also miss some. I'd appreciate if you could go over that... Sure, will do in a couple of hours. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
Hi, On 4/7/23 5:47 PM, Andres Freund wrote: Hi, - write a test that invalidated logical slots do not lead to retaining WAL I'm not sure how to do that since pg_switch_wal() and friends can't be executed on a standby. You can do it on the primary and wait for the records to have been applied. Thanks, will give it a try in a couple of hours. - Further evolve the API of InvalidateObsoleteReplicationSlots() - pass in the ReplicationSlotInvalidationCause we're trying to conflict on? - rename xid to snapshotConflictHorizon, that'd be more in line with the ResolveRecoveryConflictWithSnapshot and easier to understand, I think Done. The new API can be found in v65-66-InvalidateObsoleteReplicationSlots_API.patch attached. It propagates the cause to InvalidatePossiblyObsoleteSlot() where a switch/case can now be used. Integrated. I moved the cause to the first argument, makes more sense to me that way. thanks! I made it an error - it's a programming error, not some data level inconsistency if that ever happens. okay, makes sense. Integrated all of these. Thanks! I think pg_log_standby_snapshot() should be added in "Allow logical decoding on standby", not the commit adding the tests. Yeah, that's a good point, I do agree. Is this patchset sufficient to subscribe to a publication on a physical standby, assuming the publication is created on the primary? If so, we should have at least a minimal test. If not, we should note that restriction explicitly. I gave it a try and it does work. " node3 subscribes to node2 (standby). Insert done in node1 (primary) where the publication is created => node3 see the changes. " I started to create the TAP test but currently stuck as the "create subscription" waits for a checkpoint/pg_log_standby_snapshot() on the primary. So, trying to make use of things like: "my %psql_subscriber = ('stdin' => '', 'stdout' => ''); $psql_subscriber{run} = $node_subscriber->background_psql('postgres', \$psql_subscriber{stdin}, \$psql_subscriber{stdout}, $psql_timeout); $psql_subscriber{stdout} = ''; " But in vain so far... Will resume working on it in a couple of hours. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
Hi, On 2023-04-07 08:47:57 -0700, Andres Freund wrote: > Integrated all of these. Here's my current version. Changes: - Integrated Bertrand's changes - polished commit messages of 0001-0003 - edited code comments for 0003, including InvalidateObsoleteReplicationSlots()'s header - added a bump of SLOT_VERSION to 0001 - moved addition of pg_log_standby_snapshot() to 0007 - added a catversion bump for pg_log_standby_snapshot() - moved all the bits dealing with procsignals from 0003 to 0004, now the split makes sense IMO - combined a few more sucessive ->safe_psql() calls I see occasional failures in the tests, particularly in the new test using pg_authid, but not solely. cfbot also seems to have seen these: https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F42%2F3740 I made a bogus attempt at a workaround for the pg_authid case last night. But that didn't actually fix anything, it just changed the timing. I think the issue is that VACUUM does not force WAL to be flushed at the end (since it does not assign an xid). wait_for_replay_catchup() uses $node->lsn('flush'), which, due to VACUUM not flushing, can be an LSN from before VACUUM completed. The problem can be made more likely by adding pg_usleep(100); before walwriter.c's call to XLogBackgroundFlush(). We probably should introduce some infrastructure in Cluster.pm for this, but for now I just added a 'flush_wal' table that we insert into after a VACUUM. That guarantees a WAL flush. I think some of the patches might have more reviewers than really applicable, and might also miss some. I'd appreciate if you could go over that... Greetings, Andres Freund >From 0e038eb5dfddec500fbf4625775d1fa508a208f6 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Thu, 6 Apr 2023 20:00:07 -0700 Subject: [PATCH va67 1/9] Replace a replication slot's invalidated_at LSN with an enum This is mainly useful because the upcoming logical-decoding-on-standby feature adds further reasons for invalidating slots, and we don't want to end up with multiple invalidated_* fields, or check different attributes. Eventually we should consider not resetting restart_lsn when invalidating a slot due to max_slot_wal_keep_size. But that's a user visible change, so left for later. Increases SLOT_VERSION, due to the changed field (with a different alignment, no less). Reviewed-by: "Drouvot, Bertrand" Discussion: https://postgr.es/m/20230407075009.igg7be27ha2ht...@awork3.anarazel.de --- src/include/replication/slot.h | 15 +-- src/backend/replication/slot.c | 28 src/backend/replication/slotfuncs.c | 8 +++- src/tools/pgindent/typedefs.list| 1 + 4 files changed, 41 insertions(+), 11 deletions(-) diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h index 8872c80cdfe..ebcb637baed 100644 --- a/src/include/replication/slot.h +++ b/src/include/replication/slot.h @@ -37,6 +37,17 @@ typedef enum ReplicationSlotPersistency RS_TEMPORARY } ReplicationSlotPersistency; +/* + * Slots can be invalidated, e.g. due to max_slot_wal_keep_size. If so, the + * 'invalidated' field is set to a value other than _NONE. + */ +typedef enum ReplicationSlotInvalidationCause +{ + RS_INVAL_NONE, + /* required WAL has been removed */ + RS_INVAL_WAL, +} ReplicationSlotInvalidationCause; + /* * On-Disk data of a replication slot, preserved across restarts. */ @@ -72,8 +83,8 @@ typedef struct ReplicationSlotPersistentData /* oldest LSN that might be required by this replication slot */ XLogRecPtr restart_lsn; - /* restart_lsn is copied here when the slot is invalidated */ - XLogRecPtr invalidated_at; + /* RS_INVAL_NONE if valid, or the reason for having been invalidated */ + ReplicationSlotInvalidationCause invalidated; /* * Oldest LSN that the client has acked receipt for. This is used as the diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 2293c0c6fc3..df23b7ed31e 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -89,7 +89,7 @@ typedef struct ReplicationSlotOnDisk sizeof(ReplicationSlotOnDisk) - ReplicationSlotOnDiskConstantSize #define SLOT_MAGIC 0x1051CA1 /* format identifier */ -#define SLOT_VERSION 2 /* version for new files */ +#define SLOT_VERSION 3 /* version for new files */ /* Control array for replication slot management */ ReplicationSlotCtlData *ReplicationSlotCtl = NULL; @@ -855,8 +855,7 @@ ReplicationSlotsComputeRequiredXmin(bool already_locked) SpinLockAcquire(&s->mutex); effective_xmin = s->effective_xmin; effective_catalog_xmin = s->effective_catalog_xmin; - invalidated = (!XLogRecPtrIsInvalid(s->data.invalidated_at) && - XLogRecPtrIsInvalid(s->data.restart_lsn)); + invalidated = s->data.invalidated != RS_INVAL_NONE; SpinLockRelease(&s->mutex); /* invalidated slots need not apply */ @@ -901,14 +900,20 @@ ReplicationSlotsComputeRequiredLSN(void) { Repl
Re: Minimal logical decoding on standbys
Hi, On 2023-04-07 17:13:13 +0200, Drouvot, Bertrand wrote: > On 4/7/23 9:50 AM, Andres Freund wrote: > > I added a check for !invalidated to > > ReplicationSlotsComputeRequiredLSN() etc. > > > > looked at 65-0001 and it looks good to me. > > > Added new patch moving checks for invalid logical slots into > > CreateDecodingContext(). Otherwise we end up with 5 or so checks, which > > makes > > no sense. As far as I can tell the old message in > > pg_logical_slot_get_changes_guts() was bogus, one couldn't get there having > > "never previously reserved WAL" > > > > looked at 65-0002 and it looks good to me. > > > Split "Handle logical slot conflicts on standby." into two. I'm not sure > > that > > should stay that way, but it made it easier to hack on > > InvalidateObsoleteReplicationSlots. > > > > looked at 65-0003 and the others. Thanks for checking! > > Todo: > > - write a test that invalidated logical slots stay invalidated across a > > restart > > Done in 65-66-0008 attached. Cool. > > - write a test that invalidated logical slots do not lead to retaining WAL > > I'm not sure how to do that since pg_switch_wal() and friends can't be > executed on > a standby. You can do it on the primary and wait for the records to have been applied. > > - Further evolve the API of InvalidateObsoleteReplicationSlots() > >- pass in the ReplicationSlotInvalidationCause we're trying to conflict > > on? > >- rename xid to snapshotConflictHorizon, that'd be more in line with the > > ResolveRecoveryConflictWithSnapshot and easier to understand, I think > > > > Done. The new API can be found in > v65-66-InvalidateObsoleteReplicationSlots_API.patch > attached. It propagates the cause to InvalidatePossiblyObsoleteSlot() where a > switch/case > can now be used. Integrated. I moved the cause to the first argument, makes more sense to me that way. > The "default" case does not emit an error since this code runs as part > of checkpoint. I made it an error - it's a programming error, not some data level inconsistency if that ever happens. > > - The test could stand a bit of cleanup and consolidation > >- No need to start 4 psql processes to do 4 updates, just do it in one > > safe_psql() > > Right, done in v65-66-0008-New-TAP-test-for-logical-decoding-on-standby.patch > attached. > >- the sequence of drop_logical_slots(), create_logical_slots(), > > change_hot_standby_feedback_and_wait_for_xmins(), make_slot_active() is > > repeated quite a few times > > grouped in reactive_slots_change_hfs_and_wait_for_xmins() in 65-66-0008 > attached. > > >- the stats queries checking for specific conflict counts, including > > preceding tests, is pretty painful. I suggest to reset the stats at the > > end of the test instead (likely also do the drop_logical_slot() there). > > Good idea, done in 65-66-0008 attached. > > >- it's hard to correlate postgres log and the tap test, because the slots > > are named the same across all tests. Perhaps they could have a per-test > > prefix? > > Good point. Done in 65-66-0008 attached. Thanks to that and the stats reset > the > check for invalidation is now done in a single function > "check_for_invalidation" that looks > for invalidation messages in the logfile and in pg_stat_database_conflicts. > > Thanks for the suggestions: the TAP test is now easier to read/understand. Integrated all of these. I think pg_log_standby_snapshot() should be added in "Allow logical decoding on standby", not the commit adding the tests. Is this patchset sufficient to subscribe to a publication on a physical standby, assuming the publication is created on the primary? If so, we should have at least a minimal test. If not, we should note that restriction explicitly. Greetings, Andres Freund
Re: Minimal logical decoding on standbys
Hi, On 4/7/23 9:50 AM, Andres Freund wrote: Hi, Here's my current working state - I'll go to bed soon. Thanks a lot for this Andres! Changes: - shared catalog relations weren't handled correctly, because the dboid is InvalidOid for them. I wrote a test for that as well. - ReplicationSlotsComputeRequiredXmin() took invalidated logical slots into account (ReplicationSlotsComputeLogicalRestartLSN() too, but it never looks at logical slots) - I don't think the subset of slot xids that were checked when invalidating was right. We need to check effective_xmin and effective_catalog_xmin - the latter was using catalog_xmin. - similarly, it wasn't right that specifically those two fields were overwritten when invalidated - as that was done, I suspect the changes might get lost on a restart... - As mentioned previously, I did not like all the functions in slot.h, nor their naming. Not yet quite finished with that, but a good bit further - There were a lot of unrelated changes, e.g. removing comments like * NB - this runs as part of checkpoint, so avoid raising errors if possible. - I still don't like the order of the patches, fixing the walsender patches after introducing support for logical decoding on standby. Reordered. - I don't think logical slots being invalidated as checked e.g. in pg_logical_replication_slot_advance() - I didn't like much that InvalidatePossiblyObsoleteSlot() switched between kill() and SendProcSignal() based on the "conflict". There very well could be reasons to use InvalidatePossiblyObsoleteSlot() with an xid from outside of the startup process in the future. Instead I made it differentiate based on MyBackendType == B_STARTUP. Thanks for all of this and the above explanations. I also: Added new patch that replaces invalidated_at with a new enum, 'invalidated', listing the reason for the invalidation. Yeah, that's a great idea. I added a check for !invalidated to ReplicationSlotsComputeRequiredLSN() etc. looked at 65-0001 and it looks good to me. Added new patch moving checks for invalid logical slots into CreateDecodingContext(). Otherwise we end up with 5 or so checks, which makes no sense. As far as I can tell the old message in pg_logical_slot_get_changes_guts() was bogus, one couldn't get there having "never previously reserved WAL" looked at 65-0002 and it looks good to me. Split "Handle logical slot conflicts on standby." into two. I'm not sure that should stay that way, but it made it easier to hack on InvalidateObsoleteReplicationSlots. looked at 65-0003 and the others. It's easier to understand/read the code now that the ReplicationSlotInvalidationCause enum has been created and that data.invalidated also make use of the enum. It does "simplify" the review and that looks good to me. Todo: - write a test that invalidated logical slots stay invalidated across a restart Done in 65-66-0008 attached. - write a test that invalidated logical slots do not lead to retaining WAL I'm not sure how to do that since pg_switch_wal() and friends can't be executed on a standby. - Further evolve the API of InvalidateObsoleteReplicationSlots() - pass in the ReplicationSlotInvalidationCause we're trying to conflict on? - rename xid to snapshotConflictHorizon, that'd be more in line with the ResolveRecoveryConflictWithSnapshot and easier to understand, I think Done. The new API can be found in v65-66-InvalidateObsoleteReplicationSlots_API.patch attached. It propagates the cause to InvalidatePossiblyObsoleteSlot() where a switch/case can now be used. The "default" case does not emit an error since this code runs as part of checkpoint. - The test could stand a bit of cleanup and consolidation - No need to start 4 psql processes to do 4 updates, just do it in one safe_psql() Right, done in v65-66-0008-New-TAP-test-for-logical-decoding-on-standby.patch attached. - the sequence of drop_logical_slots(), create_logical_slots(), change_hot_standby_feedback_and_wait_for_xmins(), make_slot_active() is repeated quite a few times grouped in reactive_slots_change_hfs_and_wait_for_xmins() in 65-66-0008 attached. - the stats queries checking for specific conflict counts, including preceding tests, is pretty painful. I suggest to reset the stats at the end of the test instead (likely also do the drop_logical_slot() there). Good idea, done in 65-66-0008 attached. - it's hard to correlate postgres log and the tap test, because the slots are named the same across all tests. Perhaps they could have a per-test prefix? Good point. Done in 65-66-0008 attached. Thanks to that and the stats reset the check for invalidation is now done in a single function "check_for_invalidation" that looks for invalidation messages in the logfile and in pg_stat_database_conflicts. Thanks for the suggestions: the TAP test is now easier to read/understand. -
Re: Minimal logical decoding on standbys
Hi, On 2023-04-07 08:09:50 +0200, Drouvot, Bertrand wrote: > Hi, > > On 4/7/23 7:56 AM, Andres Freund wrote: > > Hi, > > > > On 2023-04-07 07:02:04 +0200, Drouvot, Bertrand wrote: > > > Done in V63 attached and did change the associated comment a bit. > > > > Can you send your changes incrementally, relative to V62? I'm polishing them > > right now, and that'd make it a lot easier to apply your changes ontop. > > > > Sure, please find them enclosed. Thanks. Here's my current working state - I'll go to bed soon. Changes: - shared catalog relations weren't handled correctly, because the dboid is InvalidOid for them. I wrote a test for that as well. - ReplicationSlotsComputeRequiredXmin() took invalidated logical slots into account (ReplicationSlotsComputeLogicalRestartLSN() too, but it never looks at logical slots) - I don't think the subset of slot xids that were checked when invalidating was right. We need to check effective_xmin and effective_catalog_xmin - the latter was using catalog_xmin. - similarly, it wasn't right that specifically those two fields were overwritten when invalidated - as that was done, I suspect the changes might get lost on a restart... - As mentioned previously, I did not like all the functions in slot.h, nor their naming. Not yet quite finished with that, but a good bit further - There were a lot of unrelated changes, e.g. removing comments like * NB - this runs as part of checkpoint, so avoid raising errors if possible. - I still don't like the order of the patches, fixing the walsender patches after introducing support for logical decoding on standby. Reordered. - I don't think logical slots being invalidated as checked e.g. in pg_logical_replication_slot_advance() - I didn't like much that InvalidatePossiblyObsoleteSlot() switched between kill() and SendProcSignal() based on the "conflict". There very well could be reasons to use InvalidatePossiblyObsoleteSlot() with an xid from outside of the startup process in the future. Instead I made it differentiate based on MyBackendType == B_STARTUP. I also: Added new patch that replaces invalidated_at with a new enum, 'invalidated', listing the reason for the invalidation. I added a check for !invalidated to ReplicationSlotsComputeRequiredLSN() etc. Added new patch moving checks for invalid logical slots into CreateDecodingContext(). Otherwise we end up with 5 or so checks, which makes no sense. As far as I can tell the old message in pg_logical_slot_get_changes_guts() was bogus, one couldn't get there having "never previously reserved WAL" Split "Handle logical slot conflicts on standby." into two. I'm not sure that should stay that way, but it made it easier to hack on InvalidateObsoleteReplicationSlots. Todo: - write a test that invalidated logical slots stay invalidated across a restart - write a test that invalidated logical slots do not lead to retaining WAL - Further evolve the API of InvalidateObsoleteReplicationSlots() - pass in the ReplicationSlotInvalidationCause we're trying to conflict on? - rename xid to snapshotConflictHorizon, that'd be more in line with the ResolveRecoveryConflictWithSnapshot and easier to understand, I think - The test could stand a bit of cleanup and consolidation - No need to start 4 psql processes to do 4 updates, just do it in one safe_psql() - the sequence of drop_logical_slots(), create_logical_slots(), change_hot_standby_feedback_and_wait_for_xmins(), make_slot_active() is repeated quite a few times - the stats queries checking for specific conflict counts, including preceding tests, is pretty painful. I suggest to reset the stats at the end of the test instead (likely also do the drop_logical_slot() there). - it's hard to correlate postgres log and the tap test, because the slots are named the same across all tests. Perhaps they could have a per-test prefix? - numbering tests is a PITA, I had to renumber the later ones, when adding a test for shared catalog tables My attached version does include your v62-63 incremental chnages. Greetings, Andres Freund >From 1e5461e0019678a92192b0dd5d9bf3f7105f504d Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Thu, 6 Apr 2023 20:00:07 -0700 Subject: [PATCH va65 1/9] replication slots: replace invalidated_at LSN with an enum --- src/include/replication/slot.h | 15 +-- src/backend/replication/slot.c | 21 ++--- src/backend/replication/slotfuncs.c | 8 +++- 3 files changed, 34 insertions(+), 10 deletions(-) diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h index 8872c80cdfe..793f0701b88 100644 --- a/src/include/replication/slot.h +++ b/src/include/replication/slot.h @@ -37,6 +37,17 @@ typedef enum ReplicationSlotPersistency RS_TEMPORARY } ReplicationSlotPersistency; +/* + * Slots can be invalidated, e.g. due to max_slot_wal_keep_size. If so, the + * 'invalidated' field is se
Re: Minimal logical decoding on standbys
Hi, On 4/7/23 7:56 AM, Andres Freund wrote: Hi, On 2023-04-07 07:02:04 +0200, Drouvot, Bertrand wrote: Done in V63 attached and did change the associated comment a bit. Can you send your changes incrementally, relative to V62? I'm polishing them right now, and that'd make it a lot easier to apply your changes ontop. Sure, please find them enclosed. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/src/test/recovery/t/035_standby_logical_decoding.pl b/src/test/recovery/t/035_standby_logical_decoding.pl index a4f9a3c972..5b6d19d379 100644 --- a/src/test/recovery/t/035_standby_logical_decoding.pl +++ b/src/test/recovery/t/035_standby_logical_decoding.pl @@ -6,7 +6,7 @@ use warnings; use PostgreSQL::Test::Cluster; use PostgreSQL::Test::Utils; -use Test::More tests => 67; +use Test::More; my ($stdin, $stdout, $stderr, $cascading_stdout, $cascading_stderr, $ret, $handle, $slot); @@ -112,8 +112,7 @@ sub check_slots_dropped check_pg_recvlogical_stderr($slot_user_handle, "conflict with recovery"); } -# Check if all the slots on standby are dropped. These include the 'activeslot' -# that was acquired by make_slot_active(), and the non-active 'inactiveslot'. +# Change hot_standby_feedback and check xmin and catalog_xmin values. sub change_hot_standby_feedback_and_wait_for_xmins { my ($hsf, $invalidated) = @_; @@ -560,7 +559,7 @@ ok( find_in_log( 'activeslot slot invalidation is logged due to wal_level'); # Verify that pg_stat_database_conflicts.confl_active_logicalslot has been updated -# we now expect 3 conflicts reported as the counter persist across reloads +# we now expect 4 conflicts reported as the counter persist across reloads ok( $node_standby->poll_query_until( 'postgres', "select (confl_active_logicalslot = 4) from pg_stat_database_conflicts where datname = 'testdb'", 't'), @@ -703,3 +702,5 @@ ok( pump_until( chomp($cascading_stdout); is($cascading_stdout, $expected, 'got same expected output from pg_recvlogical decoding session on cascading standby'); + +done_testing(); diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index cc4f7b5302..e6427c54c5 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -1945,7 +1945,7 @@ ApplyWalRecord(XLogReaderState *xlogreader, XLogRecord *record, TimeLineID *repl * Indeed, logical walsenders on standby can't decode and send data until * it's been applied. * -* Physical walsenders don't need to be wakon up during replay unless +* Physical walsenders don't need to be woken up during replay unless * cascading replication is allowed and time line change occured (so that * they can notice that they are on a new time line). * @@ -1953,9 +1953,11 @@ ApplyWalRecord(XLogReaderState *xlogreader, XLogRecord *record, TimeLineID *repl * * - physical walsenders in case of new time line and cascade * replication is allowed. -* - always true for logical walsenders. +* - logical walsenders in case cascade replication is allowed (could not +* be created otherwise). */ - WalSndWakeup(switchedTLI && AllowCascadeReplication(), true); + if (AllowCascadeReplication()) + WalSndWakeup(switchedTLI, true); /* * If rm_redo called XLogRequestWalReceiverReply, then we wake up the
Re: Minimal logical decoding on standbys
Hi, On 2023-04-07 07:02:04 +0200, Drouvot, Bertrand wrote: > Done in V63 attached and did change the associated comment a bit. Can you send your changes incrementally, relative to V62? I'm polishing them right now, and that'd make it a lot easier to apply your changes ontop. Greetings, Andres Freund
Re: Minimal logical decoding on standbys
Hi, On 4/7/23 5:47 AM, Amit Kapila wrote: On Thu, Apr 6, 2023 at 7:50 PM Drouvot, Bertrand wrote: Thanks! Will update 0005. I noticed a few typos in the latest patches. 0004 1. + * Physical walsenders don't need to be wakon up during replay unless Typo. Thanks! Fixed in V63 just posted up-thread. 0005 2. +# Check if all the slots on standby are dropped. These include the 'activeslot' +# that was acquired by make_slot_active(), and the non-active 'inactiveslot'. +sub check_slots_dropped +{ + my ($slot_user_handle) = @_; + + is($node_standby->slot('inactiveslot')->{'slot_type'}, '', 'inactiveslot on standby dropped'); + is($node_standby->slot('activeslot')->{'slot_type'}, '', 'activeslot on standby dropped'); + + check_pg_recvlogical_stderr($slot_user_handle, "conflict with recovery"); +} + +# Check if all the slots on standby are dropped. These include the 'activeslot' +# that was acquired by make_slot_active(), and the non-active 'inactiveslot'. +sub change_hot_standby_feedback_and_wait_for_xmins +{ + my ($hsf, $invalidated) = @_; + + $node_standby->append_conf('postgresql.conf',qq[ + hot_standby_feedback = $hsf + ]); + + $node_standby->reload; + + if ($hsf && $invalidated) + { ... The comment above change_hot_standby_feedback_and_wait_for_xmins seems to be wrong. It seems to be copied from the previous test. Good catch! Fixed in V63. 3. +# Verify that pg_stat_database_conflicts.confl_active_logicalslot has been updated +# we now expect 3 conflicts reported as the counter persist across reloads +ok( $node_standby->poll_query_until( + 'postgres', + "select (confl_active_logicalslot = 4) from pg_stat_database_conflicts where datname = 'testdb'", 't'), + 'confl_active_logicalslot updated') or die "Timed out waiting confl_active_logicalslot to be updated"; The comment incorrectly mentions 3 conflicts whereas the query expects 4. Good catch, fixed in v63. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
Hi, On 4/6/23 4:20 PM, Drouvot, Bertrand wrote: Hi, On 4/6/23 3:39 PM, Amit Kapila wrote: On Thu, Apr 6, 2023 at 6:32 PM Drouvot, Bertrand wrote: I don't think it could be possible to create logical walsenders on a standby if AllowCascadeReplication() is not true, or am I missing something? Right, so why to even traverse walsenders for that case? What I was imagining a code is like: if (AllowCascadeReplication()) WalSndWakeup(switchedTLI, true); Do you see any problem with this change? Not at all, it looks good to me. Done in V63 attached and did change the associated comment a bit. Few more minor comments on 0005 = 0005 1. + + Take a snapshot of running transactions and write this to WAL without + having to wait bgwriter or checkpointer to log one. /wait bgwriter/wait for bgwriter 2. +use Test::More tests => 67; We no more use the number of tests. Please refer to other similar tests. Thanks! Will update 0005. Done in V63. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comFrom b5ff35d8ace1e429c6a15d53203b00304a3ff1f4 Mon Sep 17 00:00:00 2001 From: bdrouvotAWS Date: Tue, 7 Feb 2023 14:08:11 + Subject: [PATCH v63 6/6] Doc changes describing details about logical decoding. Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes Mello --- doc/src/sgml/logicaldecoding.sgml | 27 +++ 1 file changed, 27 insertions(+) 100.0% doc/src/sgml/ diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml index 4e912b4bd4..8651024b8a 100644 --- a/doc/src/sgml/logicaldecoding.sgml +++ b/doc/src/sgml/logicaldecoding.sgml @@ -316,6 +316,33 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU may consume changes from a slot at any given time. + + A logical replication slot can also be created on a hot standby. To prevent + VACUUM from removing required rows from the system + catalogs, hot_standby_feedback should be set on the + standby. In spite of that, if any required rows get removed, the slot gets + invalidated. It's highly recommended to use a physical slot between the primary + and the standby. Otherwise, hot_standby_feedback will work, but only while the + connection is alive (for example a node restart would break it). Then, the + primary may delete system catalog rows that could be needed by the logical + decoding on the standby (as it does not know about the catalog_xmin on the + standby). Existing logical slots on standby also get invalidated if wal_level + on primary is reduced to less than 'logical'. This is done as soon as the + standby detects such a change in the WAL stream. It means, that for walsenders + that are lagging (if any), some WAL records up to the wal_level parameter change + on the primary won't be decoded. + + + + For a logical slot to be created, it builds a historic snapshot, for which + information of all the currently running transactions is essential. On + primary, this information is available, but on standby, this information + has to be obtained from primary. So, slot creation may wait for some + activity to happen on the primary. If the primary is idle, creating a + logical slot on standby may take a noticeable time. One option to speed it + is to call the pg_log_standby_snapshot on the primary. + + Replication slots persist across crashes and know nothing about the state -- 2.34.1 From ee35ea655de6d3178a1ec1b7b70345e2f4adccb5 Mon Sep 17 00:00:00 2001 From: bdrouvotAWS Date: Tue, 7 Feb 2023 09:04:12 + Subject: [PATCH v63 5/6] New TAP test for logical decoding on standby. In addition to the new TAP test, this commit introduces a new pg_log_standby_snapshot() function. The idea is to be able to take a snapshot of running transactions and write this to WAL without requesting for a (costly) checkpoint. Author: Craig Ringer (in an older version), Amit Khandekar, Bertrand Drouvot Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes Mello --- doc/src/sgml/func.sgml| 15 + src/backend/access/transam/xlogfuncs.c| 32 + src/backend/catalog/system_functions.sql | 2 + src/include/catalog/pg_proc.dat | 3 + src/test/perl/PostgreSQL/Test/Cluster.pm | 37 + src/test/recovery/meson.build | 1 + .../t/035_standby_logical_decoding.pl | 706 ++ 7 files changed, 796 insertions(+) 3.1% src/backend/ 4.0% src/test/perl/PostgreSQL/Test/ 89.6% src/test/recovery/t/ diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index dc44a74eb2..9253cd1c18 100644 --- a/doc/src/sgm
Re: Minimal logical decoding on standbys
Hi, On 4/7/23 4:18 AM, Andres Freund wrote: Hi, TBH, I don't like the state of 0001 much. I'm working on polishing it now. Thanks Andres! A lot of the new functions in slot.h don't seem right to me: - ObsoleteSlotIsInvalid() - isn't an obsolete slot by definition invalid? bad naming, agree. - Why does ObsoleteSlotIsInvalid() sometime check invalidated_at and sometimes not? because part of the existing code was doing so (checking if s->data.restart_lsn is valid with/without checking if data.invalidated_at is valid) and I thought it was better not to change it. - TransactionIdIsValidPrecedesOrEquals() shouldn't be defined in slot.h - also, it's not actually clear what semantics it's trying to have. Oh right, my bad for the location. - there's no commonality in naming between the functions used to test if a slot needs to be invalidated (SlotIsFreshEnough() vs LogicalSlotIsNotConflicting()). Agree, my bad. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
On 4/7/23 3:59 AM, Amit Kapila wrote: On Fri, Apr 7, 2023 at 6:55 AM Andres Freund wrote: On 2023-04-06 12:10:57 +0530, Amit Kapila wrote: After this, I think for backends that have active slots, it would simply cancel the current query. Will that be sufficient? Because we want the backend process should exit and release the slot so that the startup process can mark it invalid. We don't need them to exit, we just need them to release the slot. Which does happen when the query is cancelled. Imagine if that weren't the case - if a cancellation of pg_logical_slot_* wouldn't release the slot, we couldn't call it again before disconnecting. I also did verify that indeed the slot is released upon a cancellation. makes sense. Thanks for the clarification! +1, thanks Andres! Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
On Thu, Apr 6, 2023 at 7:50 PM Drouvot, Bertrand wrote: > > Thanks! Will update 0005. > I noticed a few typos in the latest patches. 0004 1. + * Physical walsenders don't need to be wakon up during replay unless Typo. 0005 2. +# Check if all the slots on standby are dropped. These include the 'activeslot' +# that was acquired by make_slot_active(), and the non-active 'inactiveslot'. +sub check_slots_dropped +{ + my ($slot_user_handle) = @_; + + is($node_standby->slot('inactiveslot')->{'slot_type'}, '', 'inactiveslot on standby dropped'); + is($node_standby->slot('activeslot')->{'slot_type'}, '', 'activeslot on standby dropped'); + + check_pg_recvlogical_stderr($slot_user_handle, "conflict with recovery"); +} + +# Check if all the slots on standby are dropped. These include the 'activeslot' +# that was acquired by make_slot_active(), and the non-active 'inactiveslot'. +sub change_hot_standby_feedback_and_wait_for_xmins +{ + my ($hsf, $invalidated) = @_; + + $node_standby->append_conf('postgresql.conf',qq[ + hot_standby_feedback = $hsf + ]); + + $node_standby->reload; + + if ($hsf && $invalidated) + { ... The comment above change_hot_standby_feedback_and_wait_for_xmins seems to be wrong. It seems to be copied from the previous test. 3. +# Verify that pg_stat_database_conflicts.confl_active_logicalslot has been updated +# we now expect 3 conflicts reported as the counter persist across reloads +ok( $node_standby->poll_query_until( + 'postgres', + "select (confl_active_logicalslot = 4) from pg_stat_database_conflicts where datname = 'testdb'", 't'), + 'confl_active_logicalslot updated') or die "Timed out waiting confl_active_logicalslot to be updated"; The comment incorrectly mentions 3 conflicts whereas the query expects 4. -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
On Fri, Apr 7, 2023 at 8:43 AM Andres Freund wrote: > > On 2023-04-06 12:10:57 +0530, Amit Kapila wrote: > > Also, it seems you have removed the checks related to > > slots, is it because PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT is only > > used for logical slots? If so, do you think an Assert would make > > sense? > > The asserts that have been added aren't correct. There's no guarantee that the > receiver of the procsignal still holds the same slot or any slot at all. > For backends, that don't hold any slot, can we skip setting the RecoveryConflictPending and other flags? -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
Hi, On 2023-04-06 12:10:57 +0530, Amit Kapila wrote: > Also, it seems you have removed the checks related to > slots, is it because PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT is only > used for logical slots? If so, do you think an Assert would make > sense? The asserts that have been added aren't correct. There's no guarantee that the receiver of the procsignal still holds the same slot or any slot at all. Greetings, Andres Freund
Re: Minimal logical decoding on standbys
Hi, TBH, I don't like the state of 0001 much. I'm working on polishing it now. A lot of the new functions in slot.h don't seem right to me: - ObsoleteSlotIsInvalid() - isn't an obsolete slot by definition invalid? - Why does ObsoleteSlotIsInvalid() sometime check invalidated_at and sometimes not? - DoNotInvalidateSlot() seems too generic a name for a function exposed to the outside of slot.c - TransactionIdIsValidPrecedesOrEquals() shouldn't be defined in slot.h - also, it's not actually clear what semantics it's trying to have. - there's no commonality in naming between the functions used to test if a slot needs to be invalidated (SlotIsFreshEnough() vs LogicalSlotIsNotConflicting()). Leaving naming etc aside, most of these don't seem to belong in slot.h, but should just be in slot.c - there aren't conceivable users from outside slot.c. Independent of this patch: What's the point of invalidated_at? The only reads of it are done like invalidated = (!XLogRecPtrIsInvalid(s->data.invalidated_at) && XLogRecPtrIsInvalid(s->data.restart_lsn)); i.e. the actual LSN is not used. ISTM that we should just have it be a boolean, and that it should be used by the different kinds of invalidating a slot. Greetings, Andres Freund
Re: Minimal logical decoding on standbys
On Fri, Apr 7, 2023 at 6:55 AM Andres Freund wrote: > > On 2023-04-06 12:10:57 +0530, Amit Kapila wrote: > > After this, I think for backends that have active slots, it would > > simply cancel the current query. Will that be sufficient? Because we > > want the backend process should exit and release the slot so that the > > startup process can mark it invalid. > > We don't need them to exit, we just need them to release the slot. Which does > happen when the query is cancelled. Imagine if that weren't the case - if a > cancellation of pg_logical_slot_* wouldn't release the slot, we couldn't call > it again before disconnecting. I also did verify that indeed the slot is > released upon a cancellation. > makes sense. Thanks for the clarification! -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
Hi, On 2023-04-06 12:10:57 +0530, Amit Kapila wrote: > After this, I think for backends that have active slots, it would > simply cancel the current query. Will that be sufficient? Because we > want the backend process should exit and release the slot so that the > startup process can mark it invalid. We don't need them to exit, we just need them to release the slot. Which does happen when the query is cancelled. Imagine if that weren't the case - if a cancellation of pg_logical_slot_* wouldn't release the slot, we couldn't call it again before disconnecting. I also did verify that indeed the slot is released upon a cancellation. Greetings, Andres Freund
Re: Minimal logical decoding on standbys
Hi, On 4/6/23 3:39 PM, Amit Kapila wrote: On Thu, Apr 6, 2023 at 6:32 PM Drouvot, Bertrand wrote: I don't think it could be possible to create logical walsenders on a standby if AllowCascadeReplication() is not true, or am I missing something? Right, so why to even traverse walsenders for that case? What I was imagining a code is like: if (AllowCascadeReplication()) WalSndWakeup(switchedTLI, true); Do you see any problem with this change? Not at all, it looks good to me. Few more minor comments on 0005 = 0005 1. + +Take a snapshot of running transactions and write this to WAL without +having to wait bgwriter or checkpointer to log one. /wait bgwriter/wait for bgwriter 2. +use Test::More tests => 67; We no more use the number of tests. Please refer to other similar tests. Thanks! Will update 0005. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
On Thu, Apr 6, 2023 at 6:32 PM Drouvot, Bertrand wrote: > > Hi, > > On 4/6/23 11:55 AM, Amit Kapila wrote: > > On Thu, Apr 6, 2023 at 12:10 PM Amit Kapila wrote: > >> > >> On Wed, Apr 5, 2023 at 9:27 PM Drouvot, Bertrand > >> wrote: > >>> > >> > >> Another comment on 0001. > >> extern void CheckSlotRequirements(void); > >> extern void CheckSlotPermissions(void); > >> +extern void ResolveRecoveryConflictWithLogicalSlots(Oid dboid, > >> TransactionId xid, char *reason); > >> > >> This doesn't seem to be called from anywhere. > >> > > > > Few other comments: > > == > > 0004 > > 1. > > + * - physical walsenders in case of new time line and cascade > > + * replication is allowed. > > + * - logical walsenders in case of new time line or recovery is in > > progress > > + * (logical decoding on standby). > > + */ > > + WalSndWakeup(switchedTLI && AllowCascadeReplication(), > > + switchedTLI || RecoveryInProgress()); > > > > Do we need AllowCascadeReplication() check specifically for physical > > walsenders? I think this should be true for both physical and logical > > walsenders. > > > > I don't think it could be possible to create logical walsenders on a standby > if > AllowCascadeReplication() is not true, or am I missing something? > Right, so why to even traverse walsenders for that case? What I was imagining a code is like: if (AllowCascadeReplication()) WalSndWakeup(switchedTLI, true); Do you see any problem with this change? Few more minor comments on 0005 = 0005 1. + +Take a snapshot of running transactions and write this to WAL without +having to wait bgwriter or checkpointer to log one. /wait bgwriter/wait for bgwriter 2. +use Test::More tests => 67; We no more use the number of tests. Please refer to other similar tests. -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
Hi, On 4/6/23 2:23 PM, Amit Kapila wrote: On Thu, Apr 6, 2023 at 11:29 AM Amit Kapila wrote: Thinking some more on this, I think such a slot won't decode any other records. During CreateInitDecodingContext->ReplicationSlotReserveWal, for standby's, we use lastReplayedEndRecPtr as restart_lsn. This should be a record before parameter_change record in the above scenario. So, ideally, the first record to decode by such a walsender should be parameter_change which will anyway error out. So, this shouldn't be a problem. Agree. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
Hi, On 4/6/23 7:59 AM, Amit Kapila wrote: On Wed, Apr 5, 2023 at 6:14 PM Drouvot, Bertrand wrote: On 4/5/23 12:28 PM, Amit Kapila wrote: On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand wrote: Maybe we could change the doc with something among those lines instead? " Existing logical slots on standby also get invalidated if wal_level on primary is reduced to less than 'logical'. This is done as soon as the standby detects such a change in the WAL stream. It means, that for walsenders that are lagging (if any), some WAL records up to the parameter change on the primary won't be decoded". I don't know whether this is what one would expect but that should be less of a surprise if documented. What do you think? Yeah, I think it is better to document to avoid any surprises if nobody else sees any problem with it. Ack. This doesn't seem to be addressed in the latest version. Right, I was waiting if "nobody else sees any problem with it". Added it now in V62 posted up-thread. And today, I think I see one more point about this doc change: + + A logical replication slot can also be created on a hot standby. To prevent + VACUUM from removing required rows from the system + catalogs, hot_standby_feedback should be set on the + standby. In spite of that, if any required rows get removed, the slot gets + invalidated. It's highly recommended to use a physical slot between the primary + and the standby. Otherwise, hot_standby_feedback will work, but only while the + connection is alive (for example a node restart would break it). Existing + logical slots on standby also get invalidated if wal_level on primary is reduced to + less than 'logical'. If hot_standby_feedback is not set then can logical decoding on standby misbehave? If so, that is not very clear from this doc change if that is acceptable. I don't think it would misbehave but that primary may delete system catalog rows that could be needed by the logical decoding on the standby (as it does not know about the catalog_xmin on the standby). Added this remark in V62. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
Hi, On 4/6/23 11:55 AM, Amit Kapila wrote: On Thu, Apr 6, 2023 at 12:10 PM Amit Kapila wrote: On Wed, Apr 5, 2023 at 9:27 PM Drouvot, Bertrand wrote: Another comment on 0001. extern void CheckSlotRequirements(void); extern void CheckSlotPermissions(void); +extern void ResolveRecoveryConflictWithLogicalSlots(Oid dboid, TransactionId xid, char *reason); This doesn't seem to be called from anywhere. Few other comments: == 0004 1. + * - physical walsenders in case of new time line and cascade + * replication is allowed. + * - logical walsenders in case of new time line or recovery is in progress + * (logical decoding on standby). + */ + WalSndWakeup(switchedTLI && AllowCascadeReplication(), + switchedTLI || RecoveryInProgress()); Do we need AllowCascadeReplication() check specifically for physical walsenders? I think this should be true for both physical and logical walsenders. I don't think it could be possible to create logical walsenders on a standby if AllowCascadeReplication() is not true, or am I missing something? If so, I think it has to be set to true for the logical walsenders in all the case (like done in V62 posted up-thread). Andres, made the point up-thread that RecoveryInProgress() is always true, and as we don't want to be woken up only when there is a time line change then I think it has to be always true for logical walsenders. 0005 2. --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -38,6 +38,7 @@ #include "utils/pg_lsn.h" #include "utils/timestamp.h" #include "utils/tuplestore.h" +#include "storage/standby.h" The header includes should be in alphabetical order. Good catch, thanks! Done in V62. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
Hi, On 4/6/23 8:40 AM, Amit Kapila wrote: On Wed, Apr 5, 2023 at 9:27 PM Drouvot, Bertrand After this, I think for backends that have active slots, it would simply cancel the current query. Will that be sufficient? Because we want the backend process should exit and release the slot so that the startup process can mark it invalid. For walsender, an ERROR will lead to its exit, so that is fine. If this understanding is correct, then if 'am_cascading_walsender' is false, we should set ProcDiePending apart from other parameters. Sorry, I haven't tested this, so I could be wrong here. Oops my bad. You are fully, right. Fixed in V62 posted up-thread Also, it seems you have removed the checks related to slots, is it because PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT is only used for logical slots? If so, do you think an Assert would make sense? Yes, indeed adding an Assert makes sense: done in V62 posted up-thread. Another comment on 0001. extern void CheckSlotRequirements(void); extern void CheckSlotPermissions(void); +extern void ResolveRecoveryConflictWithLogicalSlots(Oid dboid, TransactionId xid, char *reason); This doesn't seem to be called from anywhere. Good catch, removed in V62 posted up-thread. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
On Thu, Apr 6, 2023 at 11:29 AM Amit Kapila wrote: > > > > > This doesn't seem to be addressed in the latest version. And today, I > think I see one more point about this doc change: > + > + A logical replication slot can also be created on a hot standby. > To prevent > + VACUUM from removing required rows from the system > + catalogs, hot_standby_feedback should be set on the > + standby. In spite of that, if any required rows get removed, the slot > gets > + invalidated. It's highly recommended to use a physical slot > between the primary > + and the standby. Otherwise, hot_standby_feedback will work, but > only while the > + connection is alive (for example a node restart would break it). > Existing > + logical slots on standby also get invalidated if wal_level on > primary is reduced to > + less than 'logical'. > > If hot_standby_feedback is not set then can logical decoding on > standby misbehave? If so, that is not very clear from this doc change > if that is acceptable. One scenario where I think it can misbehave is > if applying WAL records generated after changing wal_level from > 'logical' to 'replica' physically removes catalog tuples that could be > referenced by the logical decoding on the standby. Now, as mentioned > in patch 0003's comment in decode.c that it is possible that some > slots may creep even after we invalidate the slots on parameter > change, so while decoding using that slot if some required catalog > tuple has been removed by physical replication then the decoding can > misbehave even before reaching XLOG_PARAMETER_CHANGE record. > Thinking some more on this, I think such a slot won't decode any other records. During CreateInitDecodingContext->ReplicationSlotReserveWal, for standby's, we use lastReplayedEndRecPtr as restart_lsn. This should be a record before parameter_change record in the above scenario. So, ideally, the first record to decode by such a walsender should be parameter_change which will anyway error out. So, this shouldn't be a problem. -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
On Thu, Apr 6, 2023 at 12:10 PM Amit Kapila wrote: > > On Wed, Apr 5, 2023 at 9:27 PM Drouvot, Bertrand > wrote: > > > > Another comment on 0001. > extern void CheckSlotRequirements(void); > extern void CheckSlotPermissions(void); > +extern void ResolveRecoveryConflictWithLogicalSlots(Oid dboid, > TransactionId xid, char *reason); > > This doesn't seem to be called from anywhere. > Few other comments: == 0004 1. + * - physical walsenders in case of new time line and cascade + * replication is allowed. + * - logical walsenders in case of new time line or recovery is in progress + * (logical decoding on standby). + */ + WalSndWakeup(switchedTLI && AllowCascadeReplication(), + switchedTLI || RecoveryInProgress()); Do we need AllowCascadeReplication() check specifically for physical walsenders? I think this should be true for both physical and logical walsenders. 0005 2. --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -38,6 +38,7 @@ #include "utils/pg_lsn.h" #include "utils/timestamp.h" #include "utils/tuplestore.h" +#include "storage/standby.h" The header includes should be in alphabetical order. -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
On Wed, Apr 5, 2023 at 9:27 PM Drouvot, Bertrand wrote: > > On 4/5/23 3:15 PM, Amit Kapila wrote: > > On Wed, Apr 5, 2023 at 6:14 PM Drouvot, Bertrand > > wrote: > >> > >> On 4/5/23 12:28 PM, Amit Kapila wrote: > >>> On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand > >>> wrote: > >> > >>> minor nitpick: > >>> + > >>> + /* Intentional fall through to session cancel */ > >>> + /* FALLTHROUGH */ > >>> > >>> Do we need to repeat fall through twice in different ways? > >>> > >> > >> Do you mean, you'd prefer what was done in v52/0002? > >> > > > > No, I was thinking that instead of two comments, we need one here. > > But, now thinking about it, do we really need to fall through in this > > case, if so why? Shouldn't this case be handled after > > PROCSIG_RECOVERY_CONFLICT_DATABASE? > > > > Indeed, thanks! Done in V61 posted up-thread. > After this, I think for backends that have active slots, it would simply cancel the current query. Will that be sufficient? Because we want the backend process should exit and release the slot so that the startup process can mark it invalid. For walsender, an ERROR will lead to its exit, so that is fine. If this understanding is correct, then if 'am_cascading_walsender' is false, we should set ProcDiePending apart from other parameters. Sorry, I haven't tested this, so I could be wrong here. Also, it seems you have removed the checks related to slots, is it because PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT is only used for logical slots? If so, do you think an Assert would make sense? Another comment on 0001. extern void CheckSlotRequirements(void); extern void CheckSlotPermissions(void); +extern void ResolveRecoveryConflictWithLogicalSlots(Oid dboid, TransactionId xid, char *reason); This doesn't seem to be called from anywhere. -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
On Wed, Apr 5, 2023 at 6:14 PM Drouvot, Bertrand wrote: > > On 4/5/23 12:28 PM, Amit Kapila wrote: > > On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand > > wrote: > >> Maybe we could change the doc with something among those lines instead? > >> > >> " > >> Existing logical slots on standby also get invalidated if wal_level on > >> primary is reduced to > >> less than 'logical'. This is done as soon as the standby detects such a > >> change in the WAL stream. > >> > >> It means, that for walsenders that are lagging (if any), some WAL records > >> up to the parameter change on the > >> primary won't be decoded". > >> > >> I don't know whether this is what one would expect but that should be less > >> of a surprise if documented. > >> > >> What do you think? > >> > > > > Yeah, I think it is better to document to avoid any surprises if > > nobody else sees any problem with it. > > Ack. > This doesn't seem to be addressed in the latest version. And today, I think I see one more point about this doc change: + + A logical replication slot can also be created on a hot standby. To prevent + VACUUM from removing required rows from the system + catalogs, hot_standby_feedback should be set on the + standby. In spite of that, if any required rows get removed, the slot gets + invalidated. It's highly recommended to use a physical slot between the primary + and the standby. Otherwise, hot_standby_feedback will work, but only while the + connection is alive (for example a node restart would break it). Existing + logical slots on standby also get invalidated if wal_level on primary is reduced to + less than 'logical'. If hot_standby_feedback is not set then can logical decoding on standby misbehave? If so, that is not very clear from this doc change if that is acceptable. One scenario where I think it can misbehave is if applying WAL records generated after changing wal_level from 'logical' to 'replica' physically removes catalog tuples that could be referenced by the logical decoding on the standby. Now, as mentioned in patch 0003's comment in decode.c that it is possible that some slots may creep even after we invalidate the slots on parameter change, so while decoding using that slot if some required catalog tuple has been removed by physical replication then the decoding can misbehave even before reaching XLOG_PARAMETER_CHANGE record. -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
Hi, On 2023-04-05 17:56:14 +0200, Drouvot, Bertrand wrote: > @@ -7963,6 +7963,23 @@ xlog_redo(XLogReaderState *record) > /* Update our copy of the parameters in pg_control */ > memcpy(&xlrec, XLogRecGetData(record), > sizeof(xl_parameter_change)); > > + /* > + * Invalidate logical slots if we are in hot standby and the > primary > + * does not have a WAL level sufficient for logical decoding. > No need > + * to search for potentially conflicting logically slots if > standby is > + * running with wal_level lower than logical, because in that > case, we > + * would have either disallowed creation of logical slots or > + * invalidated existing ones. > + */ > + if (InRecovery && InHotStandby && > + xlrec.wal_level < WAL_LEVEL_LOGICAL && > + wal_level >= WAL_LEVEL_LOGICAL) > + { > + TransactionId ConflictHorizon = InvalidTransactionId; > + > + InvalidateObsoleteReplicationSlots(InvalidXLogRecPtr, > InvalidOid, &ConflictHorizon); > + } I mentioned this before, but I still don't understand why InvalidateObsoleteReplicationSlots() accepts ConflictHorizon as a pointer. It's not even modified, as far as I can see? > /* > * Report shared-memory space needed by ReplicationSlotsShmemInit. > */ > @@ -855,8 +862,7 @@ ReplicationSlotsComputeRequiredXmin(bool already_locked) > SpinLockAcquire(&s->mutex); > effective_xmin = s->effective_xmin; > effective_catalog_xmin = s->effective_catalog_xmin; > - invalidated = (!XLogRecPtrIsInvalid(s->data.invalidated_at) && > - > XLogRecPtrIsInvalid(s->data.restart_lsn)); > + invalidated = ObsoleteSlotIsInvalid(s, true) || > LogicalReplicationSlotIsInvalid(s); > SpinLockRelease(&s->mutex); I don't understand why we need to have two different functions for this. > /* invalidated slots need not apply */ > @@ -1225,28 +1231,92 @@ ReplicationSlotReserveWal(void) > } > } > > + > +/* > + * Report terminating or conflicting message. > + * > + * For both, logical conflict on standby and obsolete slot are handled. > + */ > +static void > +ReportTerminationInvalidation(bool terminating, bool check_on_xid, int pid, > + NameData slotname, > TransactionId *xid, > + XLogRecPtr > restart_lsn, XLogRecPtr oldestLSN) > +{ > + StringInfoData err_msg; > + StringInfoData err_detail; > + boolhint = false; > + > + initStringInfo(&err_detail); > + > + if (check_on_xid) > + { > + if (!terminating) > + { > + initStringInfo(&err_msg); > + appendStringInfo(&err_msg, _("invalidating replication > slot \"%s\" because it conflicts with recovery"), > + NameStr(slotname)); I still don't think the main error message should differ between invalidating a slot due recovery and max_slot_wal_keep_size. > + > /* > - * Helper for InvalidateObsoleteReplicationSlots -- acquires the given slot > - * and mark it invalid, if necessary and possible. > + * Helper for InvalidateObsoleteReplicationSlots > + * > + * Acquires the given slot and mark it invalid, if necessary and possible. > * > * Returns whether ReplicationSlotControlLock was released in the interim > (and > * in that case we're not holding the lock at return, otherwise we are). > * > - * Sets *invalidated true if the slot was invalidated. (Untouched otherwise.) > + * Sets *invalidated true if an obsolete slot was invalidated. (Untouched > otherwise.) What's the point of making this specific to "obsolete slots"? > * This is inherently racy, because we release the LWLock > * for syscalls, so caller must restart if we return true. > */ > static bool > InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN, > -bool *invalidated) > +bool *invalidated, > TransactionId *xid) > { > int last_signaled_pid = 0; > boolreleased_lock = false; > + boolcheck_on_xid; > + > + check_on_xid = xid ? true : false; > > for (;;) > { > XLogRecPtr restart_lsn; > + > NameDataslotname; > int active_pid = 0; > > @@ -1263,19 +1333,20 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, > XLogRecPtr oldestLSN, >* Check if the slot needs to be invalidated. If it needs to be >* invalidated, and is not currently
Re: Minimal logical decoding on standbys
Hi, On 4/5/23 4:24 PM, Robert Haas wrote: On Tue, Apr 4, 2023 at 8:33 PM Jeff Davis wrote: For comments, I agree that WalSndWakeup() clearly needs a comment update. The call site in ApplyWalRecord() could also use a comment. You could add a comment at every call site, but I don't think that's necessary if there's a good comment over WalSndWakeup(). Right, we don't want to go overboard, but I think putting some of the text you wrote above for the commit message, or something with a similar theme, in the comment for WalSndWakeup() would be quite helpful. We want people to understand why the physical and logical cases are different. Gave it a try in V61 posted up-thread. I agree with you that ApplyWalRecord() is the other place where we need a good comment. I think the one in v60 needs more word-smithing. It should probably be a bit more detailed and clear about not only what we're doing but why we're doing it. Gave it a try in V61 posted up-thread. Now that I understand what's going on here a bit better, I'm inclined to think that this patch is basically fine. At least, I don't see any obvious problem with it. Thanks for the review and feedback! Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
Hi, On 4/5/23 3:15 PM, Amit Kapila wrote: On Wed, Apr 5, 2023 at 6:14 PM Drouvot, Bertrand wrote: On 4/5/23 12:28 PM, Amit Kapila wrote: On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand wrote: minor nitpick: + + /* Intentional fall through to session cancel */ + /* FALLTHROUGH */ Do we need to repeat fall through twice in different ways? Do you mean, you'd prefer what was done in v52/0002? No, I was thinking that instead of two comments, we need one here. But, now thinking about it, do we really need to fall through in this case, if so why? Shouldn't this case be handled after PROCSIG_RECOVERY_CONFLICT_DATABASE? Indeed, thanks! Done in V61 posted up-thread. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
Hi, On 4/5/23 1:59 PM, Amit Kapila wrote: On Wed, Apr 5, 2023 at 3:58 PM Amit Kapila wrote: On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand wrote: minor nitpick: + + /* Intentional fall through to session cancel */ + /* FALLTHROUGH */ Do we need to repeat fall through twice in different ways? Few minor comments on 0003: 1. + case XLOG_PARAMETER_CHANGE: + { + xl_parameter_change *xlrec = + (xl_parameter_change *) XLogRecGetData(buf->record); + + /* + * If wal_level on primary is reduced to less than logical, + * then we want to prevent existing logical slots from being + * used. Existing logical slots on standby get invalidated + * when this WAL record is replayed; and further, slot + * creation fails when the wal level is not sufficient; but + * all these operations are not synchronized, so a logical + * slot may creep in while the wal_level is being reduced. + * Hence this extra check. + */ + if (xlrec->wal_level < WAL_LEVEL_LOGICAL) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("logical decoding on standby requires wal_level to be at least logical on the primary server"))); By looking at this change, it is not very clear that this can occur only on standby. I understand that on primary, we will not allow restarting the server after changing wal_level if there is a pre-existing slot but still this looks a bit odd. Shall we have an Assert to indicate that this will occur only on standby? I think that's a fair point. Adding an Assert and a comment before the Assert in V61 attached. 2. /* - * Since logical decoding is only permitted on a primary server, we know - * that the current timeline ID can't be changing any more. If we did this - * on a standby, we'd have to worry about the values we compute here - * becoming invalid due to a promotion or timeline change. + * Since logical decoding is also permitted on a standby server, we need + * to check if the server is in recovery to decide how to get the current + * timeline ID (so that it also cover the promotion or timeline change + * cases). */ + + /* make sure we have enough WAL available */ + flushptr = WalSndWaitForWal(targetPagePtr + reqLen); + + /* the standby could have been promoted, so check if still in recovery */ + am_cascading_walsender = RecoveryInProgress(); The first part of the comment explains why it is important to check RecoveryInProgress() and then immediately after that, the patch invokes WalSndWaitForWal(). It may be better to move the comment after WalSndWaitForWal() invocation. Good catch, thanks! done in V61. Also, it will be better to write a comment as to why you need to do WalSndWaitForWal() before retrieving the current timeline as previously that was done afterward. Agree, done in V61. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comFrom 1bdbea718682cce4953310314759863302b0c0ea Mon Sep 17 00:00:00 2001 From: bdrouvotAWS Date: Tue, 7 Feb 2023 14:08:11 + Subject: [PATCH v61 6/6] Doc changes describing details about logical decoding. Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes Mello --- doc/src/sgml/logicaldecoding.sgml | 22 ++ 1 file changed, 22 insertions(+) 100.0% doc/src/sgml/ diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml index 4e912b4bd4..3da254ed1f 100644 --- a/doc/src/sgml/logicaldecoding.sgml +++ b/doc/src/sgml/logicaldecoding.sgml @@ -316,6 +316,28 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU may consume changes from a slot at any given time. + + A logical replication slot can also be created on a hot standby. To prevent + VACUUM from removing required rows from the system + catalogs, hot_standby_feedback should be set on the + standby. In spite of that, if any required rows get removed, the slot gets + invalidated. It's highly recommended to use a physical slot between the primary + and the standby. Otherwise, hot_standby_feedback will work, but only while the + connection is alive (for example a node restart would break it). Existing + logical slots on standby also get invalidated if wal_level on primary is reduced to + less than 'logical'. + + + + For a logical slot to be created, it builds a historic snapshot, for which + information of all the currently running transactions is essential. On + primary, this information is available, but on standby, this information + has to be obtained from primary. So, slot creation may wait for some + activity to happen on the primary. If the primary is idle, creating a + logical slot on standby may take a noticeable time. One option to speed it + is to call the pg_log_standby_snapshot on the primary. +
Re: Minimal logical decoding on standbys
On Tue, Apr 4, 2023 at 8:33 PM Jeff Davis wrote: > Perhaps a commit message like: > > "For cascading replication, wake up physical walsenders separately from > logical walsenders. > > Physical walsenders can't send data until it's been flushed; logical > walsenders can't decode and send data until it's been applied. On the > standby, the WAL is flushed first, which will only wake up physical > walsenders; and then applied, which will only wake up logical > walsenders. > > Previously, all walsenders were awakened when the WAL was flushed. That > was fine for logical walsenders on the primary; but on the standby the > flushed WAL would not have been applied yet, so logical walsenders were > awakened too early." This sounds great. I think it's very clear about what is being changed and why. I see that Bertrand already pulled this language into v60. > For comments, I agree that WalSndWakeup() clearly needs a comment > update. The call site in ApplyWalRecord() could also use a comment. You > could add a comment at every call site, but I don't think that's > necessary if there's a good comment over WalSndWakeup(). Right, we don't want to go overboard, but I think putting some of the text you wrote above for the commit message, or something with a similar theme, in the comment for WalSndWakeup() would be quite helpful. We want people to understand why the physical and logical cases are different. I agree with you that ApplyWalRecord() is the other place where we need a good comment. I think the one in v60 needs more word-smithing. It should probably be a bit more detailed and clear about not only what we're doing but why we're doing it. The comment in InitWalSenderSlot() seems like it might be slightly overdone, but I don't have a big problem with it so if we leave it as-is that's fine. Now that I understand what's going on here a bit better, I'm inclined to think that this patch is basically fine. At least, I don't see any obvious problem with it. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Minimal logical decoding on standbys
On Wed, Apr 5, 2023 at 6:14 PM Drouvot, Bertrand wrote: > > On 4/5/23 12:28 PM, Amit Kapila wrote: > > On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand > > wrote: > > > minor nitpick: > > + > > + /* Intentional fall through to session cancel */ > > + /* FALLTHROUGH */ > > > > Do we need to repeat fall through twice in different ways? > > > > Do you mean, you'd prefer what was done in v52/0002? > No, I was thinking that instead of two comments, we need one here. But, now thinking about it, do we really need to fall through in this case, if so why? Shouldn't this case be handled after PROCSIG_RECOVERY_CONFLICT_DATABASE? -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
Hi, On 4/5/23 12:28 PM, Amit Kapila wrote: On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand wrote: Maybe we could change the doc with something among those lines instead? " Existing logical slots on standby also get invalidated if wal_level on primary is reduced to less than 'logical'. This is done as soon as the standby detects such a change in the WAL stream. It means, that for walsenders that are lagging (if any), some WAL records up to the parameter change on the primary won't be decoded". I don't know whether this is what one would expect but that should be less of a surprise if documented. What do you think? Yeah, I think it is better to document to avoid any surprises if nobody else sees any problem with it. Ack. BTW, another thought that crosses my mind is that let's not invalidate the slots when the standby startup process processes parameter_change record and rather do it when walsender decodes the parameter_change record, if we think that is safe. I have shared this as this crosses my mind while thinking about this part of the patch and wanted to validate my thoughts, we don't need to change even if the idea is valid. I think this is a valid idea but I think I do prefer the current one (where the startup process triggers the invalidations) because: - I think this is better to invalidate as soon as possible. In case of inactive logical replication slot (walsenders stopped) it could take time to get "notified". While with the current approach you'd get notified in the logfile and pg_replication_slots even if walsenders are stopped. - This is not a "slot" dependent invalidation (as opposed to the xid invalidations case) - This is "somehow" the same behavior as on the primary: if one change the wal_level to be < logical then the engine will not start (if logical slot in place). Then what has been decoded is until the time the engine has been stopped. So if there is walsender lag, you'd not see some records. minor nitpick: + + /* Intentional fall through to session cancel */ + /* FALLTHROUGH */ Do we need to repeat fall through twice in different ways? Do you mean, you'd prefer what was done in v52/0002? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
On Wed, Apr 5, 2023 at 3:58 PM Amit Kapila wrote: > > On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand > wrote: > > minor nitpick: > + > + /* Intentional fall through to session cancel */ > + /* FALLTHROUGH */ > > Do we need to repeat fall through twice in different ways? > Few minor comments on 0003: 1. + case XLOG_PARAMETER_CHANGE: + { + xl_parameter_change *xlrec = + (xl_parameter_change *) XLogRecGetData(buf->record); + + /* + * If wal_level on primary is reduced to less than logical, + * then we want to prevent existing logical slots from being + * used. Existing logical slots on standby get invalidated + * when this WAL record is replayed; and further, slot + * creation fails when the wal level is not sufficient; but + * all these operations are not synchronized, so a logical + * slot may creep in while the wal_level is being reduced. + * Hence this extra check. + */ + if (xlrec->wal_level < WAL_LEVEL_LOGICAL) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("logical decoding on standby requires wal_level to be at least logical on the primary server"))); By looking at this change, it is not very clear that this can occur only on standby. I understand that on primary, we will not allow restarting the server after changing wal_level if there is a pre-existing slot but still this looks a bit odd. Shall we have an Assert to indicate that this will occur only on standby? 2. /* - * Since logical decoding is only permitted on a primary server, we know - * that the current timeline ID can't be changing any more. If we did this - * on a standby, we'd have to worry about the values we compute here - * becoming invalid due to a promotion or timeline change. + * Since logical decoding is also permitted on a standby server, we need + * to check if the server is in recovery to decide how to get the current + * timeline ID (so that it also cover the promotion or timeline change + * cases). */ + + /* make sure we have enough WAL available */ + flushptr = WalSndWaitForWal(targetPagePtr + reqLen); + + /* the standby could have been promoted, so check if still in recovery */ + am_cascading_walsender = RecoveryInProgress(); The first part of the comment explains why it is important to check RecoveryInProgress() and then immediately after that, the patch invokes WalSndWaitForWal(). It may be better to move the comment after WalSndWaitForWal() invocation. Also, it will be better to write a comment as to why you need to do WalSndWaitForWal() before retrieving the current timeline as previously that was done afterward. -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
On Wed, Apr 5, 2023 at 2:41 PM Drouvot, Bertrand wrote: > > On 4/5/23 8:59 AM, Amit Kapila wrote: > > On Mon, Apr 3, 2023 at 12:05 PM Amit Kapila wrote: > > > On further thinking, as such this shouldn't be a problem because all > > the WAL records before PARAMETER_CHANGE record will have sufficient > > information so that they can get decoded. However, with the current > > approach, the subscriber may not even receive the valid records before > > PARAMETER_CHANGE record. This is because startup process will > > terminate the walsenders while invaliding the slots and after restart > > the walsenders will exit because the corresponding slot will be an > > invalid slot. So, it is quite possible that walsender was lagging and > > wouldn't have sent records before the PARAMETER_CHANGE record making > > subscriber never receive those records that it should have received. > > Agree that would behave that way. > > > I don't know whether this is what one would expect. > > If one change wal_level to < logical on the primary, he should at least > know that: > > " > Existing > + logical slots on standby also get invalidated if wal_level on primary > is reduced to > + less than 'logical'. > " > > If the doc has been read (as the quote above is coming from 0006). > > I think that what is missing is the "when" the slots are invalidated. > > Maybe we could change the doc with something among those lines instead? > > " > Existing logical slots on standby also get invalidated if wal_level on > primary is reduced to > less than 'logical'. This is done as soon as the standby detects such a > change in the WAL stream. > > It means, that for walsenders that are lagging (if any), some WAL records up > to the parameter change on the > primary won't be decoded". > > I don't know whether this is what one would expect but that should be less of > a surprise if documented. > > What do you think? > Yeah, I think it is better to document to avoid any surprises if nobody else sees any problem with it. BTW, another thought that crosses my mind is that let's not invalidate the slots when the standby startup process processes parameter_change record and rather do it when walsender decodes the parameter_change record, if we think that is safe. I have shared this as this crosses my mind while thinking about this part of the patch and wanted to validate my thoughts, we don't need to change even if the idea is valid. minor nitpick: + + /* Intentional fall through to session cancel */ + /* FALLTHROUGH */ Do we need to repeat fall through twice in different ways? -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
Hi, On 4/5/23 8:59 AM, Amit Kapila wrote: On Mon, Apr 3, 2023 at 12:05 PM Amit Kapila wrote: On further thinking, as such this shouldn't be a problem because all the WAL records before PARAMETER_CHANGE record will have sufficient information so that they can get decoded. However, with the current approach, the subscriber may not even receive the valid records before PARAMETER_CHANGE record. This is because startup process will terminate the walsenders while invaliding the slots and after restart the walsenders will exit because the corresponding slot will be an invalid slot. So, it is quite possible that walsender was lagging and wouldn't have sent records before the PARAMETER_CHANGE record making subscriber never receive those records that it should have received. Agree that would behave that way. I don't know whether this is what one would expect. If one change wal_level to < logical on the primary, he should at least know that: " Existing + logical slots on standby also get invalidated if wal_level on primary is reduced to + less than 'logical'. " If the doc has been read (as the quote above is coming from 0006). I think that what is missing is the "when" the slots are invalidated. Maybe we could change the doc with something among those lines instead? " Existing logical slots on standby also get invalidated if wal_level on primary is reduced to less than 'logical'. This is done as soon as the standby detects such a change in the WAL stream. It means, that for walsenders that are lagging (if any), some WAL records up to the parameter change on the primary won't be decoded". I don't know whether this is what one would expect but that should be less of a surprise if documented. What do you think? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
Hi, On 4/5/23 2:33 AM, Jeff Davis wrote: On Tue, 2023-04-04 at 14:55 -0400, Robert Haas wrote: Thanks for your continued work on $SUBJECT. I just took a look at 0004, Thanks Robert for the feedback! and I think that at the very least the commit message needs work. Agree. Perhaps a commit message like: "For cascading replication, wake up physical walsenders separately from logical walsenders. Physical walsenders can't send data until it's been flushed; logical walsenders can't decode and send data until it's been applied. On the standby, the WAL is flushed first, which will only wake up physical walsenders; and then applied, which will only wake up logical walsenders. Previously, all walsenders were awakened when the WAL was flushed. That was fine for logical walsenders on the primary; but on the standby the flushed WAL would not have been applied yet, so logical walsenders were awakened too early." Thanks Jeff for the commit message proposal! It looks good to me except that I think that "flushed WAL could have been not applied yet" is better than "flushed WAL would not have been applied yet" but it's obviously open to discussion. Currently changed it that way and used it in V60 shared up-thread. For comments, I agree that WalSndWakeup() clearly needs a comment update. The call site in ApplyWalRecord() could also use a comment. You could add a comment at every call site, but I don't think that's necessary if there's a good comment over WalSndWakeup(). Agree, added a comment over WalSndWakeup() and one before calling WalSndWakeup() in ApplyWalRecord() in V60. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
Hi, On 4/4/23 8:13 PM, Jeff Davis wrote: On Tue, 2023-04-04 at 11:42 +0200, Drouvot, Bertrand wrote: Done in V58 and now this is as simple as: Minor comments on 0004 (address if you agree): Thanks for the review! * Consider static inline for WalSndWakeupProcessRequests()? Agree and done in V60 just shared up-thread. * Is the WalSndWakeup() in KeepFileRestoredFromArchive() more like the flush case? Why is the second argument unconditionally true? I don't think the cascading logical walsenders have anything to do until the WAL is actually applied. Agree and changed it to "(true, false)" in V60. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
Hi, On 4/4/23 7:53 PM, Andres Freund wrote: Hi, On 2023-04-04 18:54:33 +0200, Drouvot, Bertrand wrote: if (check_on_xid) { if (terminating) appendStringInfo(&err_msg, _("terminating process %d to release replication slot \"%s\" because it conflicts with recovery"), pid, NameStr(slotname)); FWIW, I would just use exactly the same error message as today here. errmsg("terminating process %d to release replication slot \"%s\"", active_pid, NameStr(slotname)), This is accurate for both the existing and the new case. Then there's no need to put that string into a stringinfo either. Right, thanks! Did it that way in V60 attached. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comFrom 4861568b10fe2187d38f2997ad916a984918aa5b Mon Sep 17 00:00:00 2001 From: bdrouvotAWS Date: Tue, 7 Feb 2023 14:08:11 + Subject: [PATCH v60 6/6] Doc changes describing details about logical decoding. Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes Mello --- doc/src/sgml/logicaldecoding.sgml | 22 ++ 1 file changed, 22 insertions(+) 100.0% doc/src/sgml/ diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml index 4e912b4bd4..3da254ed1f 100644 --- a/doc/src/sgml/logicaldecoding.sgml +++ b/doc/src/sgml/logicaldecoding.sgml @@ -316,6 +316,28 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU may consume changes from a slot at any given time. + + A logical replication slot can also be created on a hot standby. To prevent + VACUUM from removing required rows from the system + catalogs, hot_standby_feedback should be set on the + standby. In spite of that, if any required rows get removed, the slot gets + invalidated. It's highly recommended to use a physical slot between the primary + and the standby. Otherwise, hot_standby_feedback will work, but only while the + connection is alive (for example a node restart would break it). Existing + logical slots on standby also get invalidated if wal_level on primary is reduced to + less than 'logical'. + + + + For a logical slot to be created, it builds a historic snapshot, for which + information of all the currently running transactions is essential. On + primary, this information is available, but on standby, this information + has to be obtained from primary. So, slot creation may wait for some + activity to happen on the primary. If the primary is idle, creating a + logical slot on standby may take a noticeable time. One option to speed it + is to call the pg_log_standby_snapshot on the primary. + + Replication slots persist across crashes and know nothing about the state -- 2.34.1 From 3dd93d8dbc601accedceae199e539ba74252e092 Mon Sep 17 00:00:00 2001 From: bdrouvotAWS Date: Tue, 7 Feb 2023 09:04:12 + Subject: [PATCH v60 5/6] New TAP test for logical decoding on standby. In addition to the new TAP test, this commit introduces a new pg_log_standby_snapshot() function. The idea is to be able to take a snapshot of running transactions and write this to WAL without requesting for a (costly) checkpoint. Author: Craig Ringer (in an older version), Amit Khandekar, Bertrand Drouvot Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes Mello --- doc/src/sgml/func.sgml| 15 + src/backend/access/transam/xlogfuncs.c| 32 + src/backend/catalog/system_functions.sql | 2 + src/include/catalog/pg_proc.dat | 3 + src/test/perl/PostgreSQL/Test/Cluster.pm | 37 + src/test/recovery/meson.build | 1 + .../t/035_standby_logical_decoding.pl | 705 ++ 7 files changed, 795 insertions(+) 3.1% src/backend/ 4.0% src/test/perl/PostgreSQL/Test/ 89.7% src/test/recovery/t/ diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 918a492234..939fb8019f 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -27034,6 +27034,21 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset prepared with . + + + + pg_log_standby_snapshot + +pg_log_standby_snapshot () +pg_lsn + + +Take a snapshot of running transactions and write this to WAL without +having to wait bgwriter or checkpointer to log one. This one is useful for +logical decoding on standby for which logical slot creation is hanging +until such a record is
Re: Minimal logical decoding on standbys
On Mon, Apr 3, 2023 at 12:05 PM Amit Kapila wrote: > > On Mon, Apr 3, 2023 at 4:39 AM Alvaro Herrera wrote: > > > > > From 56a9559555918a99c202a0924f7b2ede9de4e75d Mon Sep 17 00:00:00 2001 > > > From: bdrouvotAWS > > > Date: Tue, 7 Feb 2023 08:59:47 + > > > Subject: [PATCH v52 3/6] Allow logical decoding on standby. > > > > > > Allow a logical slot to be created on standby. Restrict its usage > > > or its creation if wal_level on primary is less than logical. > > > During slot creation, it's restart_lsn is set to the last replayed > > > LSN. Effectively, a logical slot creation on standby waits for an > > > xl_running_xact record to arrive from primary. > > > > Hmm, not sure if it really applies here, but this sounds similar to > > issues with track_commit_timestamps: namely, if the primary has it > > enabled and you start a standby with it enabled, that's fine; but if the > > primary is later shut down (but the standby isn't) and then the primary > > restarted with a lesser value, then the standby would misbehave without > > any obvious errors. > > > > IIUC, the patch deals it by invalidating logical slots while replaying > the XLOG_PARAMETER_CHANGE record on standby. Then later during > decoding, if it encounters XLOG_PARAMETER_CHANGE, and wal_level from > primary has been reduced, it will return an error. There is a race > condition here as explained in the patch as follows: > > + /* > + * If wal_level on primary is reduced to less than logical, then we > + * want to prevent existing logical slots from being used. > + * Existing logical slots on standby get invalidated when this WAL > + * record is replayed; and further, slot creation fails when the > + * wal level is not sufficient; but all these operations are not > + * synchronized, so a logical slot may creep in while the wal_level > + * is being reduced. Hence this extra check. > + */ > + if (xlrec->wal_level < WAL_LEVEL_LOGICAL) > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("logical decoding on standby requires " > + "wal_level >= logical on master"))); > > Now, during this race condition, say not only does a logical slot > creep in but also one tries to decode WAL using the same then some > misbehavior is expected. I have not tried this so not sure if this is > really a problem but are you worried about something along those > lines? > On further thinking, as such this shouldn't be a problem because all the WAL records before PARAMETER_CHANGE record will have sufficient information so that they can get decoded. However, with the current approach, the subscriber may not even receive the valid records before PARAMETER_CHANGE record. This is because startup process will terminate the walsenders while invaliding the slots and after restart the walsenders will exit because the corresponding slot will be an invalid slot. So, it is quite possible that walsender was lagging and wouldn't have sent records before the PARAMETER_CHANGE record making subscriber never receive those records that it should have received. I don't know whether this is what one would expect. One other observation is that once this error has been raised both standby and subscriber will keep on getting this error in the loop unless the user manually disables the subscription on the subscriber. -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
Hi, On 2023-04-04 17:33:25 -0700, Jeff Davis wrote: > On Tue, 2023-04-04 at 14:55 -0400, Robert Haas wrote: > > That's presumably OK, in the > > sense that they'll go back to sleep and eventually wake up again, but > > it means they might end up chronically behind sending out WAL to > > cascading standbys. > > Without 0004, cascading logical walsenders would have worse wakeup > behavior than logical walsenders on the primary. Assuming the fix is > small in scope and otherwise acceptable, I think it belongs as a part > of this overall series. FWIW, personally, I wouldn't feel ok with committing 0003 without 0004. And IMO they ought to be committed the other way round. The stalls you *can* get, depending on the speed of WAL apply and OS scheduling, can be long. This is actually why a predecessor version of the feature had a bunch of sleeps and retries in the tests, just to avoid those stalls. Obviously that's not a good path... Greetings, Andres Freund
Re: Minimal logical decoding on standbys
On Tue, 2023-04-04 at 14:55 -0400, Robert Haas wrote: > Thanks for your continued work on $SUBJECT. I just took a look at > 0004, and I think that at the very least the commit message needs > work. Nobody who is not a hacker is going to understand what problem > this is fixing, because it makes reference to the names of functions > and structure members rather than user-visible behavior. In fact, I'm > not really sure that I understand the problem myself. It seems like > the problem is that on a standby, WAL senders will get woken up too > early, before we have any WAL to send. Logical walsenders on the standby, specifically, which didn't exist before this patch series. > That's presumably OK, in the > sense that they'll go back to sleep and eventually wake up again, but > it means they might end up chronically behind sending out WAL to > cascading standbys. Without 0004, cascading logical walsenders would have worse wakeup behavior than logical walsenders on the primary. Assuming the fix is small in scope and otherwise acceptable, I think it belongs as a part of this overall series. > If that's right, I think it should be spelled out > more clearly in the commit message, and maybe also in the code > comments. Perhaps a commit message like: "For cascading replication, wake up physical walsenders separately from logical walsenders. Physical walsenders can't send data until it's been flushed; logical walsenders can't decode and send data until it's been applied. On the standby, the WAL is flushed first, which will only wake up physical walsenders; and then applied, which will only wake up logical walsenders. Previously, all walsenders were awakened when the WAL was flushed. That was fine for logical walsenders on the primary; but on the standby the flushed WAL would not have been applied yet, so logical walsenders were awakened too early." (I'm not sure if I quite got the verb tenses right.) For comments, I agree that WalSndWakeup() clearly needs a comment update. The call site in ApplyWalRecord() could also use a comment. You could add a comment at every call site, but I don't think that's necessary if there's a good comment over WalSndWakeup(). Regards, Jeff Davis
Re: Minimal logical decoding on standbys
On Tue, Apr 4, 2023 at 5:44 AM Drouvot, Bertrand wrote: > Oh right, even better, thanks! > Done in V58 and now this is as simple as: > > + if (DoNotInvalidateSlot(s, xid, &oldestLSN)) > { > /* then, we are not forcing for invalidation */ Thanks for your continued work on $SUBJECT. I just took a look at 0004, and I think that at the very least the commit message needs work. Nobody who is not a hacker is going to understand what problem this is fixing, because it makes reference to the names of functions and structure members rather than user-visible behavior. In fact, I'm not really sure that I understand the problem myself. It seems like the problem is that on a standby, WAL senders will get woken up too early, before we have any WAL to send. That's presumably OK, in the sense that they'll go back to sleep and eventually wake up again, but it means they might end up chronically behind sending out WAL to cascading standbys. If that's right, I think it should be spelled out more clearly in the commit message, and maybe also in the code comments. But the weird thing is that most (all?) of the patch doesn't seem to be about that issue at all. Instead, it's about separating wakeups of physical walsenders from wakeups of logical walsenders. I don't see how that could ever fix the kind of problem I mentioned in the preceding paragraph, so my guess is that this is a separate change. But this change doesn't really seem adequately justified. The commit message says that it "helps to filter what kind of walsender we want to wakeup based on the code path" but that's awfully vague about what the actual benefit is. I wonder whether many people have a mix of physical and logical systems connecting to the same machine such that this would even help, and if they do have that, would this really do enough to solve any performance problem that might be caused by too many wakeups? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Minimal logical decoding on standbys
On Tue, 2023-04-04 at 11:42 +0200, Drouvot, Bertrand wrote: > Done in V58 and now this is as simple as: Minor comments on 0004 (address if you agree): * Consider static inline for WalSndWakeupProcessRequests()? * Is the WalSndWakeup() in KeepFileRestoredFromArchive() more like the flush case? Why is the second argument unconditionally true? I don't think the cascading logical walsenders have anything to do until the WAL is actually applied. Otherwise, looks good! Regards, Jeff Davis
Re: Minimal logical decoding on standbys
Hi, On 2023-04-04 18:54:33 +0200, Drouvot, Bertrand wrote: > if (check_on_xid) > { > if (terminating) > appendStringInfo(&err_msg, _("terminating process %d to release > replication slot \"%s\" because it conflicts with recovery"), > pid, > NameStr(slotname)); FWIW, I would just use exactly the same error message as today here. errmsg("terminating process %d to release replication slot \"%s\"", active_pid, NameStr(slotname)), This is accurate for both the existing and the new case. Then there's no need to put that string into a stringinfo either. Greetings, Andres Freund
Re: Minimal logical decoding on standbys
Hi, On 2023-04-04 13:21:38 +0200, Alvaro Herrera wrote: > On 2023-Apr-03, Andres Freund wrote: > > > Hm? That's what the _'s do. We build strings in parts in other places too. > > No, what _() does is mark each piece for translation separately. But a > translation cannot be done on string pieces, and later have all the > pieces appended together to form a full sentence. Let me show the > "!terminating" case as example and grab some translations for it from > src/backend/po/de.po: > > "invalidating" -> "... wird ungültig gemacht" (?) > > (if logical) " obsolete replication" -> " obsolete Replikation" > > " slot \"%s\" because it conflicts with recovery" -> " Slot \"%s\", weil sie > in Konflikt mit Wiederherstellung steht" > > If you just concatenate all the translated phrases together, the > resulting string will make no sense; keep in mind the "obsolete > replication" part may or not may not be there. And there's no way to > make that work: even if you found an ordering of the English parts that > allows you to translate each piece separately and have it make sense for > German, the same won't work for Spanish or Japanese. Ah, I misunderstood the angle you're coming from. Yes, the pieces need to be reasonable fragments, instead of half-sentences. Greetings, Andres Freund
Re: Minimal logical decoding on standbys
Hi, On 4/4/23 1:21 PM, Alvaro Herrera wrote: Hi, On 2023-Apr-03, Andres Freund wrote: Hm? That's what the _'s do. We build strings in parts in other places too. No, what _() does is mark each piece for translation separately. But a translation cannot be done on string pieces, and later have all the pieces appended together to form a full sentence. Let me show the "!terminating" case as example and grab some translations for it from src/backend/po/de.po: "invalidating" -> "... wird ungültig gemacht" (?) (if logical) " obsolete replication" -> " obsolete Replikation" " slot \"%s\" because it conflicts with recovery" -> " Slot \"%s\", weil sie in Konflikt mit Wiederherstellung steht" If you just concatenate all the translated phrases together, the resulting string will make no sense; keep in mind the "obsolete replication" part may or not may not be there. And there's no way to make that work: even if you found an ordering of the English parts that allows you to translate each piece separately and have it make sense for German, the same won't work for Spanish or Japanese. You have to give the translator a complete phrase and let them turn into a complete translated phrases. Building from parts doesn't work. We're very good at avoiding string building; we have a couple of cases, but they are *very* minor. string 1 "invalidating slot \"%s\" because it conflicts with recovery" string 2 "invalidating obsolete replication slot \"%s\" because it conflicts with recovery" Thanks for looking at it and the explanations! (I'm not clear on why did Bertrand omitted the word "replication" in the case where the slot is not logical) It makes more sense to add it, will do thanks! I think the errdetail() are okay, it's the errmsg() bits that are bogus. And yes, well caught on having to use errmsg_internal and errdetail_internal() to avoid double translation. So, IIUC having something like this would be fine? " if (check_on_xid) { if (terminating) appendStringInfo(&err_msg, _("terminating process %d to release replication slot \"%s\" because it conflicts with recovery"), pid, NameStr(slotname)); else appendStringInfo(&err_msg, _("invalidating replication slot \"%s\" because it conflicts with recovery"), NameStr(slotname)); if (TransactionIdIsValid(*xid)) appendStringInfo(&err_detail, _("The slot conflicted with xid horizon %u."), *xid); else appendStringInfo(&err_detail, _("Logical decoding on standby requires wal_level to be at least logical on the primary server")); } else { if (terminating) appendStringInfo(&err_msg, _("terminating process %d to release replication slot \"%s\""), pid, NameStr(slotname)); else appendStringInfo(&err_msg, _("invalidating obsolete replication slot \"%s\""), NameStr(slotname)); appendStringInfo(&err_detail, _("The slot's restart_lsn %X/%X exceeds the limit by %llu bytes."), LSN_FORMAT_ARGS(restart_lsn), (unsigned long long) (oldestLSN - restart_lsn)); hint = true; } ereport(LOG, errmsg_internal("%s", err_msg.data), errdetail_internal("%s", err_detail.data), hint ? errhint("You might need to increase max_slot_wal_keep_size.") : 0); " as err_msg is not concatenated anymore (I mean it's just one sentence build one time) and this make use of errmsg_internal() and errdetail_internal(). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
Hi, On 4/4/23 3:43 PM, Amit Kapila wrote: On Tue, Apr 4, 2023 at 6:05 PM Drouvot, Bertrand I think we might want to consider slot's effective_xmin instead of data.xmin as we use that to store xmin_horizon when we build the full snapshot. Oh, I did not know about the 'effective_xmin' and was going to rely only on the catalog xmin. Reading the comment in the ReplicationSlot struct about the 'effective_xmin' I do think it makes sense to use it (instead of data.xmin). Please find attached v59 doing so. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comFrom fad28278fa13bf3564c878aba57fb6d1e6623d59 Mon Sep 17 00:00:00 2001 From: bdrouvotAWS Date: Tue, 7 Feb 2023 14:08:11 + Subject: [PATCH v59 6/6] Doc changes describing details about logical decoding. Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes Mello --- doc/src/sgml/logicaldecoding.sgml | 22 ++ 1 file changed, 22 insertions(+) 100.0% doc/src/sgml/ diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml index 4e912b4bd4..3da254ed1f 100644 --- a/doc/src/sgml/logicaldecoding.sgml +++ b/doc/src/sgml/logicaldecoding.sgml @@ -316,6 +316,28 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU may consume changes from a slot at any given time. + + A logical replication slot can also be created on a hot standby. To prevent + VACUUM from removing required rows from the system + catalogs, hot_standby_feedback should be set on the + standby. In spite of that, if any required rows get removed, the slot gets + invalidated. It's highly recommended to use a physical slot between the primary + and the standby. Otherwise, hot_standby_feedback will work, but only while the + connection is alive (for example a node restart would break it). Existing + logical slots on standby also get invalidated if wal_level on primary is reduced to + less than 'logical'. + + + + For a logical slot to be created, it builds a historic snapshot, for which + information of all the currently running transactions is essential. On + primary, this information is available, but on standby, this information + has to be obtained from primary. So, slot creation may wait for some + activity to happen on the primary. If the primary is idle, creating a + logical slot on standby may take a noticeable time. One option to speed it + is to call the pg_log_standby_snapshot on the primary. + + Replication slots persist across crashes and know nothing about the state -- 2.34.1 From 9e038e69816a1b0722c15515dbfbef3310198e39 Mon Sep 17 00:00:00 2001 From: bdrouvotAWS Date: Tue, 7 Feb 2023 09:04:12 + Subject: [PATCH v59 5/6] New TAP test for logical decoding on standby. In addition to the new TAP test, this commit introduces a new pg_log_standby_snapshot() function. The idea is to be able to take a snapshot of running transactions and write this to WAL without requesting for a (costly) checkpoint. Author: Craig Ringer (in an older version), Amit Khandekar, Bertrand Drouvot Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes Mello --- doc/src/sgml/func.sgml| 15 + src/backend/access/transam/xlogfuncs.c| 32 + src/backend/catalog/system_functions.sql | 2 + src/include/catalog/pg_proc.dat | 3 + src/test/perl/PostgreSQL/Test/Cluster.pm | 37 + src/test/recovery/meson.build | 1 + .../t/035_standby_logical_decoding.pl | 705 ++ 7 files changed, 795 insertions(+) 3.1% src/backend/ 4.0% src/test/perl/PostgreSQL/Test/ 89.7% src/test/recovery/t/ diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 918a492234..939fb8019f 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -27034,6 +27034,21 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset prepared with . + + + + pg_log_standby_snapshot + +pg_log_standby_snapshot () +pg_lsn + + +Take a snapshot of running transactions and write this to WAL without +having to wait bgwriter or checkpointer to log one. This one is useful for +logical decoding on standby for which logical slot creation is hanging +until such a record is replayed on the standby. + + diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index c07daa874f..481e9a47da 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -38,6 +38,7 @@ #include "utils/pg_lsn.h" #inc
Re: Minimal logical decoding on standbys
On Tue, Apr 4, 2023 at 6:05 PM Drouvot, Bertrand wrote: > > On 4/4/23 1:43 PM, Amit Kapila wrote: > > On Tue, Apr 4, 2023 at 3:14 PM Drouvot, Bertrand > > wrote: > >> > > > > > > +static inline bool > > +LogicalReplicationSlotXidsConflict(ReplicationSlot *s, TransactionId xid) > > +{ > > + TransactionId slot_xmin; > > + TransactionId slot_catalog_xmin; > > + > > + slot_xmin = s->data.xmin; > > + slot_catalog_xmin = s->data.catalog_xmin; > > + > > + return (((TransactionIdIsValid(slot_xmin) && > > TransactionIdPrecedesOrEquals(slot_xmin, xid)) || > > > > For logical slots, slot->data.xmin will always be an > > InvalidTransactionId. It will only be set/updated for physical slots. > > So, it is not clear to me why in this and other related functions, you > > are referring to and or invalidating it. > > > > I think you're right that invalidating/checking only on the catalog xmin is > enough for logical slot (I'm not sure how I ended up taking the xmin into > account but > that seems useless indeed). > I think we might want to consider slot's effective_xmin instead of data.xmin as we use that to store xmin_horizon when we build the full snapshot. -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
Hi, On 4/4/23 1:43 PM, Amit Kapila wrote: On Tue, Apr 4, 2023 at 3:14 PM Drouvot, Bertrand wrote: +static inline bool +LogicalReplicationSlotXidsConflict(ReplicationSlot *s, TransactionId xid) +{ + TransactionId slot_xmin; + TransactionId slot_catalog_xmin; + + slot_xmin = s->data.xmin; + slot_catalog_xmin = s->data.catalog_xmin; + + return (((TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid)) || For logical slots, slot->data.xmin will always be an InvalidTransactionId. It will only be set/updated for physical slots. So, it is not clear to me why in this and other related functions, you are referring to and or invalidating it. I think you're right that invalidating/checking only on the catalog xmin is enough for logical slot (I'm not sure how I ended up taking the xmin into account but that seems useless indeed). I'll submit a new version to deal with the catalog xmin only, thanks! Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
On Tue, Apr 4, 2023 at 3:14 PM Drouvot, Bertrand wrote: > +static inline bool +LogicalReplicationSlotXidsConflict(ReplicationSlot *s, TransactionId xid) +{ + TransactionId slot_xmin; + TransactionId slot_catalog_xmin; + + slot_xmin = s->data.xmin; + slot_catalog_xmin = s->data.catalog_xmin; + + return (((TransactionIdIsValid(slot_xmin) && TransactionIdPrecedesOrEquals(slot_xmin, xid)) || For logical slots, slot->data.xmin will always be an InvalidTransactionId. It will only be set/updated for physical slots. So, it is not clear to me why in this and other related functions, you are referring to and or invalidating it. -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
Hi, On 2023-Apr-03, Andres Freund wrote: > Hm? That's what the _'s do. We build strings in parts in other places too. No, what _() does is mark each piece for translation separately. But a translation cannot be done on string pieces, and later have all the pieces appended together to form a full sentence. Let me show the "!terminating" case as example and grab some translations for it from src/backend/po/de.po: "invalidating" -> "... wird ungültig gemacht" (?) (if logical) " obsolete replication" -> " obsolete Replikation" " slot \"%s\" because it conflicts with recovery" -> " Slot \"%s\", weil sie in Konflikt mit Wiederherstellung steht" If you just concatenate all the translated phrases together, the resulting string will make no sense; keep in mind the "obsolete replication" part may or not may not be there. And there's no way to make that work: even if you found an ordering of the English parts that allows you to translate each piece separately and have it make sense for German, the same won't work for Spanish or Japanese. You have to give the translator a complete phrase and let them turn into a complete translated phrases. Building from parts doesn't work. We're very good at avoiding string building; we have a couple of cases, but they are *very* minor. string 1 "invalidating slot \"%s\" because it conflicts with recovery" string 2 "invalidating obsolete replication slot \"%s\" because it conflicts with recovery" (I'm not clear on why did Bertrand omitted the word "replication" in the case where the slot is not logical) I think the errdetail() are okay, it's the errmsg() bits that are bogus. And yes, well caught on having to use errmsg_internal and errdetail_internal() to avoid double translation. Cheers -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: Minimal logical decoding on standbys
Hi, On 4/4/23 9:48 AM, Masahiko Sawada wrote: On Tue, Apr 4, 2023 at 10:55 AM Masahiko Sawada wrote: Regarding 0004 patch: @@ -2626,6 +2626,12 @@ InitWalSenderSlot(void) walsnd->sync_standby_priority = 0; walsnd->latch = &MyProc->procLatch; walsnd->replyTime = 0; + + if (MyDatabaseId == InvalidOid) + walsnd->kind = REPLICATION_KIND_PHYSICAL; + else + walsnd->kind = REPLICATION_KIND_LOGICAL; + I think we might want to set the replication kind when processing the START_REPLICATION command. The walsender using a logical replication slot is not necessarily streaming (e.g. when COPYing table data). Discussing with Bertrand off-list, it's wrong as the logical replication slot creation also needs to read WAL records so a walsender who is creating a logical replication slot needs to be woken up. We can set it the replication kind when processing START_REPLICATION and CREATE_REPLICATION_SLOT, but it seems better to set it in one place. So I agree to set it in InitWalSenderSlot(). Thanks for the review and feedback! Added a comment in 0004 in V58 just posted up-thread to explain the reason why the walsnd->kind assignment is done InitWalSenderSlot(). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
Hi, On 4/4/23 7:57 AM, Amit Kapila wrote: On Mon, Apr 3, 2023 at 8:51 PM Drouvot, Bertrand wrote: I made it as simple as: /* * If the slot is already invalid or is a non conflicting slot, we don't * need to do anything. */ islogical = xid ? true : false; if (SlotIsInvalid(s, islogical) || SlotIsNotConflicting(s, islogical, xid, &oldestLSN)) in V56 attached. Here the variable 'islogical' doesn't seem to convey its actual meaning because one can imagine that it indicates whether the slot is logical which I don't think is the actual intent. Good point. Just renamed it to 'check_on_xid' (as still needed outside of the "CanInvalidateSlot" context) in V58 attached. One idea to simplify this is to introduce a single function CanInvalidateSlot() or something like that and move the logic from both the functions SlotIsInvalid() and SlotIsNotConflicting() into the new function. Oh right, even better, thanks! Done in V58 and now this is as simple as: + if (DoNotInvalidateSlot(s, xid, &oldestLSN)) { /* then, we are not forcing for invalidation */ Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comFrom 7c61edd93b4df1efb9723ddae41c009ccbac9f59 Mon Sep 17 00:00:00 2001 From: bdrouvotAWS Date: Tue, 7 Feb 2023 14:08:11 + Subject: [PATCH v58 6/6] Doc changes describing details about logical decoding. Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes Mello --- doc/src/sgml/logicaldecoding.sgml | 22 ++ 1 file changed, 22 insertions(+) 100.0% doc/src/sgml/ diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml index 4e912b4bd4..3da254ed1f 100644 --- a/doc/src/sgml/logicaldecoding.sgml +++ b/doc/src/sgml/logicaldecoding.sgml @@ -316,6 +316,28 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU may consume changes from a slot at any given time. + + A logical replication slot can also be created on a hot standby. To prevent + VACUUM from removing required rows from the system + catalogs, hot_standby_feedback should be set on the + standby. In spite of that, if any required rows get removed, the slot gets + invalidated. It's highly recommended to use a physical slot between the primary + and the standby. Otherwise, hot_standby_feedback will work, but only while the + connection is alive (for example a node restart would break it). Existing + logical slots on standby also get invalidated if wal_level on primary is reduced to + less than 'logical'. + + + + For a logical slot to be created, it builds a historic snapshot, for which + information of all the currently running transactions is essential. On + primary, this information is available, but on standby, this information + has to be obtained from primary. So, slot creation may wait for some + activity to happen on the primary. If the primary is idle, creating a + logical slot on standby may take a noticeable time. One option to speed it + is to call the pg_log_standby_snapshot on the primary. + + Replication slots persist across crashes and know nothing about the state -- 2.34.1 From 14f9f51aaacd8889ef1f9853534c9303fc89f59f Mon Sep 17 00:00:00 2001 From: bdrouvotAWS Date: Tue, 7 Feb 2023 09:04:12 + Subject: [PATCH v58 5/6] New TAP test for logical decoding on standby. In addition to the new TAP test, this commit introduces a new pg_log_standby_snapshot() function. The idea is to be able to take a snapshot of running transactions and write this to WAL without requesting for a (costly) checkpoint. Author: Craig Ringer (in an older version), Amit Khandekar, Bertrand Drouvot Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes Mello --- doc/src/sgml/func.sgml| 15 + src/backend/access/transam/xlogfuncs.c| 32 + src/backend/catalog/system_functions.sql | 2 + src/include/catalog/pg_proc.dat | 3 + src/test/perl/PostgreSQL/Test/Cluster.pm | 37 + src/test/recovery/meson.build | 1 + .../t/035_standby_logical_decoding.pl | 705 ++ 7 files changed, 795 insertions(+) 3.1% src/backend/ 4.0% src/test/perl/PostgreSQL/Test/ 89.7% src/test/recovery/t/ diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 918a492234..939fb8019f 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -27034,6 +27034,21 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset prepared with . + + + + pg_log_standby_snapshot + +
Re: Minimal logical decoding on standbys
On Tue, Apr 4, 2023 at 10:55 AM Masahiko Sawada wrote: > > On Tue, Apr 4, 2023 at 3:17 AM Drouvot, Bertrand > wrote: > > > > Hi, > > > > On 4/3/23 8:10 AM, Drouvot, Bertrand wrote: > > > Hi, > > > > > > On 4/3/23 7:35 AM, Amit Kapila wrote: > > >> On Mon, Apr 3, 2023 at 4:26 AM Jeff Davis wrote: > > >> > > >> Agreed, even Bertrand and myself discussed the same approach few > > >> emails above. BTW, if we have this selective logic to wake > > >> physical/logical walsenders and for standby's, we only wake logical > > >> walsenders at the time of ApplyWalRecord() then do we need the new > > >> conditional variable enhancement being discussed, and if so, why? > > >> > > > > > > Thank you both for this new idea and discussion. In that case I don't > > > think > > > we need the new CV API and the use of a CV anymore. As just said > > > up-thread I'll submit > > > a new proposal with this new approach. > > > > > > > Please find enclosed V57 implementing the new approach in 0004. > > Regarding 0004 patch: > > @@ -2626,6 +2626,12 @@ InitWalSenderSlot(void) > walsnd->sync_standby_priority = 0; > walsnd->latch = &MyProc->procLatch; > walsnd->replyTime = 0; > + > + if (MyDatabaseId == InvalidOid) > + walsnd->kind = REPLICATION_KIND_PHYSICAL; > + else > + walsnd->kind = REPLICATION_KIND_LOGICAL; > + > > I think we might want to set the replication kind when processing the > START_REPLICATION command. The walsender using a logical replication > slot is not necessarily streaming (e.g. when COPYing table data). > Discussing with Bertrand off-list, it's wrong as the logical replication slot creation also needs to read WAL records so a walsender who is creating a logical replication slot needs to be woken up. We can set it the replication kind when processing START_REPLICATION and CREATE_REPLICATION_SLOT, but it seems better to set it in one place. So I agree to set it in InitWalSenderSlot(). Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
On Mon, Apr 3, 2023 at 8:51 PM Drouvot, Bertrand wrote: > > On 4/2/23 10:10 PM, Andres Freund wrote: > > Hi, > >> restart_lsn = s->data.restart_lsn; > >> - > >> -/* > >> - * If the slot is already invalid or is fresh enough, we > >> don't need to > >> - * do anything. > >> - */ > >> -if (XLogRecPtrIsInvalid(restart_lsn) || restart_lsn >= > >> oldestLSN) > >> +slot_xmin = s->data.xmin; > >> +slot_catalog_xmin = s->data.catalog_xmin; > >> + > >> +/* the slot has been invalidated (logical decoding conflict > >> case) */ > >> +if ((xid && ((LogicalReplicationSlotIsInvalid(s)) || > >> +/* or the xid is valid and this is a non conflicting slot */ > >> + (TransactionIdIsValid(*xid) && > >> !(LogicalReplicationSlotXidsConflict(slot_xmin, slot_catalog_xmin, > >> *xid) || > >> +/* or the slot has been invalidated (obsolete LSN case) */ > >> +(!xid && (XLogRecPtrIsInvalid(restart_lsn) || > >> restart_lsn >= oldestLSN))) > >> { > > > > This still looks nearly unreadable. I suggest moving comments outside of the > > if (), remove redundant parentheses, use a function to detect if the slot > > has > > been invalidated. > > > > I made it as simple as: > > /* > * If the slot is already invalid or is a non conflicting slot, we > don't > * need to do anything. > */ > islogical = xid ? true : false; > > if (SlotIsInvalid(s, islogical) || SlotIsNotConflicting(s, > islogical, xid, &oldestLSN)) > > in V56 attached. > Here the variable 'islogical' doesn't seem to convey its actual meaning because one can imagine that it indicates whether the slot is logical which I don't think is the actual intent. One idea to simplify this is to introduce a single function CanInvalidateSlot() or something like that and move the logic from both the functions SlotIsInvalid() and SlotIsNotConflicting() into the new function. -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
On Tue, Apr 4, 2023 at 3:17 AM Drouvot, Bertrand wrote: > > Hi, > > On 4/3/23 8:10 AM, Drouvot, Bertrand wrote: > > Hi, > > > > On 4/3/23 7:35 AM, Amit Kapila wrote: > >> On Mon, Apr 3, 2023 at 4:26 AM Jeff Davis wrote: > >> > >> Agreed, even Bertrand and myself discussed the same approach few > >> emails above. BTW, if we have this selective logic to wake > >> physical/logical walsenders and for standby's, we only wake logical > >> walsenders at the time of ApplyWalRecord() then do we need the new > >> conditional variable enhancement being discussed, and if so, why? > >> > > > > Thank you both for this new idea and discussion. In that case I don't think > > we need the new CV API and the use of a CV anymore. As just said up-thread > > I'll submit > > a new proposal with this new approach. > > > > Please find enclosed V57 implementing the new approach in 0004. Regarding 0004 patch: @@ -2626,6 +2626,12 @@ InitWalSenderSlot(void) walsnd->sync_standby_priority = 0; walsnd->latch = &MyProc->procLatch; walsnd->replyTime = 0; + + if (MyDatabaseId == InvalidOid) + walsnd->kind = REPLICATION_KIND_PHYSICAL; + else + walsnd->kind = REPLICATION_KIND_LOGICAL; + I think we might want to set the replication kind when processing the START_REPLICATION command. The walsender using a logical replication slot is not necessarily streaming (e.g. when COPYing table data). Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
Hi, On 2023-04-03 17:34:52 +0200, Alvaro Herrera wrote: > On 2023-Apr-03, Drouvot, Bertrand wrote: > > > +/* > > + * Report terminating or conflicting message. > > + * > > + * For both, logical conflict on standby and obsolete slot are handled. > > + */ > > +static void > > +ReportTerminationInvalidation(bool terminating, bool islogical, int pid, > > + NameData slotname, > > TransactionId *xid, > > + XLogRecPtr > > restart_lsn, XLogRecPtr oldestLSN) > > +{ > > > + if (terminating) > > + appendStringInfo(&err_msg, _("terminating process %d to release > > replication slot \"%s\""), > > +pid, > > +NameStr(slotname)); > > + else > > + appendStringInfo(&err_msg, _("invalidating")); > > + > > + if (islogical) > > + { > > + if (terminating) > > + appendStringInfo(&err_msg, _(" because it conflicts > > with recovery")); > > You can't build the strings this way, because it's not possible to put > the strings into the translation machinery. You need to write full > strings for each separate case instead, without appending other string > parts later. Hm? That's what the _'s do. We build strings in parts in other places too. You do need to use errmsg_internal() later, to prevent that format string from being translated as well. I'm not say that this is exactly the right way, don't get me wrong. Greetings, Andres Freund
Re: Minimal logical decoding on standbys
Hi, On 4/3/23 8:10 AM, Drouvot, Bertrand wrote: Hi, On 4/3/23 7:35 AM, Amit Kapila wrote: On Mon, Apr 3, 2023 at 4:26 AM Jeff Davis wrote: Agreed, even Bertrand and myself discussed the same approach few emails above. BTW, if we have this selective logic to wake physical/logical walsenders and for standby's, we only wake logical walsenders at the time of ApplyWalRecord() then do we need the new conditional variable enhancement being discussed, and if so, why? Thank you both for this new idea and discussion. In that case I don't think we need the new CV API and the use of a CV anymore. As just said up-thread I'll submit a new proposal with this new approach. Please find enclosed V57 implementing the new approach in 0004. With the new approach in place the TAP tests (0005) work like a charm (no delay and even after a promotion). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comFrom 0d198484e008090a524562076326054be56935ca Mon Sep 17 00:00:00 2001 From: bdrouvotAWS Date: Tue, 7 Feb 2023 14:08:11 + Subject: [PATCH v57 6/6] Doc changes describing details about logical decoding. Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes Mello --- doc/src/sgml/logicaldecoding.sgml | 22 ++ 1 file changed, 22 insertions(+) 100.0% doc/src/sgml/ diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml index 4e912b4bd4..3da254ed1f 100644 --- a/doc/src/sgml/logicaldecoding.sgml +++ b/doc/src/sgml/logicaldecoding.sgml @@ -316,6 +316,28 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU may consume changes from a slot at any given time. + + A logical replication slot can also be created on a hot standby. To prevent + VACUUM from removing required rows from the system + catalogs, hot_standby_feedback should be set on the + standby. In spite of that, if any required rows get removed, the slot gets + invalidated. It's highly recommended to use a physical slot between the primary + and the standby. Otherwise, hot_standby_feedback will work, but only while the + connection is alive (for example a node restart would break it). Existing + logical slots on standby also get invalidated if wal_level on primary is reduced to + less than 'logical'. + + + + For a logical slot to be created, it builds a historic snapshot, for which + information of all the currently running transactions is essential. On + primary, this information is available, but on standby, this information + has to be obtained from primary. So, slot creation may wait for some + activity to happen on the primary. If the primary is idle, creating a + logical slot on standby may take a noticeable time. One option to speed it + is to call the pg_log_standby_snapshot on the primary. + + Replication slots persist across crashes and know nothing about the state -- 2.34.1 From 4e392c06e39f36c4185780a21f2b90c7c6a97de4 Mon Sep 17 00:00:00 2001 From: bdrouvotAWS Date: Tue, 7 Feb 2023 09:04:12 + Subject: [PATCH v57 5/6] New TAP test for logical decoding on standby. In addition to the new TAP test, this commit introduces a new pg_log_standby_snapshot() function. The idea is to be able to take a snapshot of running transactions and write this to WAL without requesting for a (costly) checkpoint. Author: Craig Ringer (in an older version), Amit Khandekar, Bertrand Drouvot Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes Mello --- doc/src/sgml/func.sgml| 15 + src/backend/access/transam/xlogfuncs.c| 32 + src/backend/catalog/system_functions.sql | 2 + src/include/catalog/pg_proc.dat | 3 + src/test/perl/PostgreSQL/Test/Cluster.pm | 37 + src/test/recovery/meson.build | 1 + .../t/035_standby_logical_decoding.pl | 705 ++ 7 files changed, 795 insertions(+) 3.1% src/backend/ 4.0% src/test/perl/PostgreSQL/Test/ 89.7% src/test/recovery/t/ diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 918a492234..939fb8019f 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -27034,6 +27034,21 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset prepared with . + + + + pg_log_standby_snapshot + +pg_log_standby_snapshot () +pg_lsn + + +Take a snapshot of running transactions and write this to WAL without +having to wait bgwriter or checkpointer to log one. This one is useful for +logical decoding on standby for which logical slot creation is hanging +until such a
Re: Minimal logical decoding on standbys
On 2023-Apr-03, Drouvot, Bertrand wrote: > +/* > + * Report terminating or conflicting message. > + * > + * For both, logical conflict on standby and obsolete slot are handled. > + */ > +static void > +ReportTerminationInvalidation(bool terminating, bool islogical, int pid, > + NameData slotname, > TransactionId *xid, > + XLogRecPtr > restart_lsn, XLogRecPtr oldestLSN) > +{ > + if (terminating) > + appendStringInfo(&err_msg, _("terminating process %d to release > replication slot \"%s\""), > + pid, > + NameStr(slotname)); > + else > + appendStringInfo(&err_msg, _("invalidating")); > + > + if (islogical) > + { > + if (terminating) > + appendStringInfo(&err_msg, _(" because it conflicts > with recovery")); You can't build the strings this way, because it's not possible to put the strings into the translation machinery. You need to write full strings for each separate case instead, without appending other string parts later. Thanks -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)
Re: Minimal logical decoding on standbys
Hi, On 4/2/23 10:10 PM, Andres Freund wrote: Hi, Btw, most of the patches have some things that pgindent will change (and some that my editor will highlight). It wouldn't hurt to run pgindent for the later patches... done. Pushed the WAL format change. Thanks! On 2023-04-02 10:27:45 +0200, Drouvot, Bertrand wrote: 5.3% doc/src/sgml/ 6.2% src/backend/access/transam/ 4.6% src/backend/replication/logical/ 55.6% src/backend/replication/ 4.4% src/backend/storage/ipc/ 6.9% src/backend/tcop/ 5.3% src/backend/ 3.8% src/include/catalog/ 5.3% src/include/replication/ I think it might be worth trying to split this up a bit. Okay. Split in 2 parts in V56 enclosed. One part to handle logical slot conflicts on standby, and one part to arrange for a new pg_stat_database_conflicts and pg_replication_slots field. restart_lsn = s->data.restart_lsn; - - /* -* If the slot is already invalid or is fresh enough, we don't need to -* do anything. -*/ - if (XLogRecPtrIsInvalid(restart_lsn) || restart_lsn >= oldestLSN) + slot_xmin = s->data.xmin; + slot_catalog_xmin = s->data.catalog_xmin; + + /* the slot has been invalidated (logical decoding conflict case) */ + if ((xid && ((LogicalReplicationSlotIsInvalid(s)) || + /* or the xid is valid and this is a non conflicting slot */ +(TransactionIdIsValid(*xid) && !(LogicalReplicationSlotXidsConflict(slot_xmin, slot_catalog_xmin, *xid) || + /* or the slot has been invalidated (obsolete LSN case) */ + (!xid && (XLogRecPtrIsInvalid(restart_lsn) || restart_lsn >= oldestLSN))) { This still looks nearly unreadable. I suggest moving comments outside of the if (), remove redundant parentheses, use a function to detect if the slot has been invalidated. I made it as simple as: /* * If the slot is already invalid or is a non conflicting slot, we don't * need to do anything. */ islogical = xid ? true : false; if (SlotIsInvalid(s, islogical) || SlotIsNotConflicting(s, islogical, xid, &oldestLSN)) in V56 attached. @@ -1329,16 +1345,45 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN, */ if (last_signaled_pid != active_pid) { + boolsend_signal = false; + + initStringInfo(&err_msg); + initStringInfo(&err_detail); + + appendStringInfo(&err_msg, "terminating process %d to release replication slot \"%s\"", +active_pid, + NameStr(slotname)); For this to be translatable you need to use _("message"). Thanks! + if (xid) + { + appendStringInfo(&err_msg, " because it conflicts with recovery"); + send_signal = true; + + if (TransactionIdIsValid(*xid)) + appendStringInfo(&err_detail, "The slot conflicted with xid horizon %u.", *xid); + else + appendStringInfo(&err_detail, "Logical decoding on standby requires wal_level to be at least logical on the primary server"); + } + else + { + appendStringInfo(&err_detail, "The slot's restart_lsn %X/%X exceeds the limit by %llu bytes.", + LSN_FORMAT_ARGS(restart_lsn), + (unsigned long long) (oldestLSN - restart_lsn)); + } + ereport(LOG, - errmsg("terminating process %d to release replication slot \"%s\"", - active_pid, NameStr(slotname)), - errdetail("The slot's restart_lsn %X/%X exceeds the limit by %llu bytes.", - LSN_FORMAT_ARGS(restart_lsn), - (unsigned long long) (oldestLSN - restart_lsn)), - errhint("You might need to increase max_slot_wal_keep_size.")); - -
Re: Minimal logical decoding on standbys
On Mon, Apr 3, 2023 at 4:39 AM Alvaro Herrera wrote: > > > From 56a9559555918a99c202a0924f7b2ede9de4e75d Mon Sep 17 00:00:00 2001 > > From: bdrouvotAWS > > Date: Tue, 7 Feb 2023 08:59:47 + > > Subject: [PATCH v52 3/6] Allow logical decoding on standby. > > > > Allow a logical slot to be created on standby. Restrict its usage > > or its creation if wal_level on primary is less than logical. > > During slot creation, it's restart_lsn is set to the last replayed > > LSN. Effectively, a logical slot creation on standby waits for an > > xl_running_xact record to arrive from primary. > > Hmm, not sure if it really applies here, but this sounds similar to > issues with track_commit_timestamps: namely, if the primary has it > enabled and you start a standby with it enabled, that's fine; but if the > primary is later shut down (but the standby isn't) and then the primary > restarted with a lesser value, then the standby would misbehave without > any obvious errors. > IIUC, the patch deals it by invalidating logical slots while replaying the XLOG_PARAMETER_CHANGE record on standby. Then later during decoding, if it encounters XLOG_PARAMETER_CHANGE, and wal_level from primary has been reduced, it will return an error. There is a race condition here as explained in the patch as follows: + /* + * If wal_level on primary is reduced to less than logical, then we + * want to prevent existing logical slots from being used. + * Existing logical slots on standby get invalidated when this WAL + * record is replayed; and further, slot creation fails when the + * wal level is not sufficient; but all these operations are not + * synchronized, so a logical slot may creep in while the wal_level + * is being reduced. Hence this extra check. + */ + if (xlrec->wal_level < WAL_LEVEL_LOGICAL) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("logical decoding on standby requires " + "wal_level >= logical on master"))); Now, during this race condition, say not only does a logical slot creep in but also one tries to decode WAL using the same then some misbehavior is expected. I have not tried this so not sure if this is really a problem but are you worried about something along those lines? > If that is a real problem, then perhaps you can > solve it by copying some of the logic from track_commit_timestamps, > which took a large number of iterations to get right. > IIUC, track_commit_timestamps deactivates the CommitTs module (by using state in the shared memory) when replaying the XLOG_PARAMETER_CHANGE record. Then later using that state it gives an error from the appropriate place in the CommitTs module. If my understanding is correct then that appears to be a better design than what the patch is currently doing. Also, the error message used in error_commit_ts_disabled() seems to be better than the current one. -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
Hi, On 4/3/23 7:35 AM, Amit Kapila wrote: On Mon, Apr 3, 2023 at 4:26 AM Jeff Davis wrote: On Fri, 2023-03-31 at 02:50 -0700, Jeff Davis wrote: But if the ConditionVariableEventSleep() API is added, then I think we should change the non-recovery case to use a CV as well for consistency, and it would avoid the need for WalSndWakeup(). It seems like what we ultimately want is for WalSndWakeup() to selectively wake up physical and/or logical walsenders depending on the caller. For instance: WalSndWakeup(bool physical, bool logical) The callers: * On promotion, StartupXLog would call: - WalSndWakeup(true, true) * XLogFlush/XLogBackgroundFlush/XLogWalRcvFlush would call: - WalSndWakeup(true, !RecoveryInProgress()) * ApplyWalRecord would call: - WalSndWakeup(switchedTLI, switchedTLI || RecoveryInProgress()) There seem to be two approaches to making that work: 1. Use two ConditionVariables, and WalSndWakeup would broadcast to one or both depending on its arguments. 2. Have a "replicaiton_kind" variable in WalSnd (either set based on MyDatabaseId==InvalidOid, or set at START_REPLICATION time) to indicate whether it's a physical or logical walsender. WalSndWakeup would wake up the right walsenders based on its arguments. #2 seems simpler at least for now. Would that work? Agreed, even Bertrand and myself discussed the same approach few emails above. BTW, if we have this selective logic to wake physical/logical walsenders and for standby's, we only wake logical walsenders at the time of ApplyWalRecord() then do we need the new conditional variable enhancement being discussed, and if so, why? Thank you both for this new idea and discussion. In that case I don't think we need the new CV API and the use of a CV anymore. As just said up-thread I'll submit a new proposal with this new approach. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
Hi, On 4/3/23 7:20 AM, Amit Kapila wrote: On Mon, Apr 3, 2023 at 1:31 AM Jeff Davis wrote: On Sun, 2023-04-02 at 10:11 +0200, Drouvot, Bertrand wrote: I was thinking that, if a new LogicalWalSndWakeup() replaces "ConditionVariableBroadcast(&XLogRecoveryCtl->replayedCV);" in ApplyWalRecord() then, it could be possible that some walsender(s) are requested to wake up while they are actually doing decoding (but I might be wrong). I don't think that's a problem, right? Agreed, I also don't see a problem because of the reason you mentioned below that if the latch is already set, we won't do anything in SetLatch. Thanks for the feedback, I do agree too after Jeff's and your explanation. We are concerned about wakeups when they happen repeatedly when there's no work to do, or when the wakeup doesn't happen when it should (and we need to wait for a timeout). Currently, we wake up walsenders only after writing some WAL records at the time of flush, so won't it be better to wake up only after applying some WAL records rather than after applying each record? Yeah that would be better. Why? If the walsender is asleep, and there's work to be done, why not wake it up? I think we can wake it up when there is work to be done even if the work unit is smaller. The reason why I mentioned waking up the walsender only after processing some records is to avoid the situation where it may not need to wait again after decoding very few records. But probably the logic in WalSndWaitForWal() will help us to exit before starting to wait by checking the replay location. Okay, I'll re-write the sub-patch related to the startup/walsender corner case with this new approach. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
On Mon, Apr 3, 2023 at 4:26 AM Jeff Davis wrote: > > On Fri, 2023-03-31 at 02:50 -0700, Jeff Davis wrote: > > But if the ConditionVariableEventSleep() API is added, then I think > > we > > should change the non-recovery case to use a CV as well for > > consistency, and it would avoid the need for WalSndWakeup(). > > It seems like what we ultimately want is for WalSndWakeup() to > selectively wake up physical and/or logical walsenders depending on the > caller. For instance: > >WalSndWakeup(bool physical, bool logical) > > The callers: > > * On promotion, StartupXLog would call: > - WalSndWakeup(true, true) > * XLogFlush/XLogBackgroundFlush/XLogWalRcvFlush would call: > - WalSndWakeup(true, !RecoveryInProgress()) > * ApplyWalRecord would call: > - WalSndWakeup(switchedTLI, switchedTLI || RecoveryInProgress()) > > There seem to be two approaches to making that work: > > 1. Use two ConditionVariables, and WalSndWakeup would broadcast to one > or both depending on its arguments. > > 2. Have a "replicaiton_kind" variable in WalSnd (either set based on > MyDatabaseId==InvalidOid, or set at START_REPLICATION time) to indicate > whether it's a physical or logical walsender. WalSndWakeup would wake > up the right walsenders based on its arguments. > > #2 seems simpler at least for now. Would that work? > Agreed, even Bertrand and myself discussed the same approach few emails above. BTW, if we have this selective logic to wake physical/logical walsenders and for standby's, we only wake logical walsenders at the time of ApplyWalRecord() then do we need the new conditional variable enhancement being discussed, and if so, why? -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
On Mon, Apr 3, 2023 at 1:31 AM Jeff Davis wrote: > > On Sun, 2023-04-02 at 10:11 +0200, Drouvot, Bertrand wrote: > > I was thinking that, if a new LogicalWalSndWakeup() replaces > > "ConditionVariableBroadcast(&XLogRecoveryCtl->replayedCV);" > > in ApplyWalRecord() then, it could be possible that some walsender(s) > > are requested to wake up while they are actually doing decoding (but > > I might be wrong). > > I don't think that's a problem, right? > Agreed, I also don't see a problem because of the reason you mentioned below that if the latch is already set, we won't do anything in SetLatch. > We are concerned about wakeups when they happen repeatedly when there's > no work to do, or when the wakeup doesn't happen when it should (and we > need to wait for a timeout). > > > > > > > Currently, we wake up walsenders only after writing some WAL > > > records > > > at the time of flush, so won't it be better to wake up only after > > > applying some WAL records rather than after applying each record? > > > > Yeah that would be better. > > Why? If the walsender is asleep, and there's work to be done, why not > wake it up? > I think we can wake it up when there is work to be done even if the work unit is smaller. The reason why I mentioned waking up the walsender only after processing some records is to avoid the situation where it may not need to wait again after decoding very few records. But probably the logic in WalSndWaitForWal() will help us to exit before starting to wait by checking the replay location. -- With Regards, Amit Kapila.
Re: Minimal logical decoding on standbys
> From 56a9559555918a99c202a0924f7b2ede9de4e75d Mon Sep 17 00:00:00 2001 > From: bdrouvotAWS > Date: Tue, 7 Feb 2023 08:59:47 + > Subject: [PATCH v52 3/6] Allow logical decoding on standby. > > Allow a logical slot to be created on standby. Restrict its usage > or its creation if wal_level on primary is less than logical. > During slot creation, it's restart_lsn is set to the last replayed > LSN. Effectively, a logical slot creation on standby waits for an > xl_running_xact record to arrive from primary. Hmm, not sure if it really applies here, but this sounds similar to issues with track_commit_timestamps: namely, if the primary has it enabled and you start a standby with it enabled, that's fine; but if the primary is later shut down (but the standby isn't) and then the primary restarted with a lesser value, then the standby would misbehave without any obvious errors. If that is a real problem, then perhaps you can solve it by copying some of the logic from track_commit_timestamps, which took a large number of iterations to get right. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "No hay ausente sin culpa ni presente sin disculpa" (Prov. francés)
Re: Minimal logical decoding on standbys
On Sun, 2023-04-02 at 15:29 -0700, Andres Freund wrote: > I agree that the *wait* has to go through condition_variable.c, but > it doesn't > seem right that creation of the WES needs to go through > condition_variable.c. The kind of WES required by a CV is an implementation detail, so I was concerned about making too many assumptions across different APIs. But what I ended up with is arguably not better, so perhaps I should do it your way and then just have some comments about what assumptions are being made? > The only thing that ConditionVariableEventSleep() seems to require is > that the > WES is waiting for MyLatch. You don't even need a separate WES for > that, the > already existing WES should suffice. By "already existing" WES, I assume you mean FeBeWaitSet? Yes, that mostly matches, but it uses WL_POSTMASTER_DEATH instead of WL_EXIT_ON_PM_DEATH, so I'd need to handle PM death in condition_variable.c. That's trivial to do, though. Regards, Jeff Davis
Re: Minimal logical decoding on standbys
On Fri, 2023-03-31 at 02:50 -0700, Jeff Davis wrote: > But if the ConditionVariableEventSleep() API is added, then I think > we > should change the non-recovery case to use a CV as well for > consistency, and it would avoid the need for WalSndWakeup(). It seems like what we ultimately want is for WalSndWakeup() to selectively wake up physical and/or logical walsenders depending on the caller. For instance: WalSndWakeup(bool physical, bool logical) The callers: * On promotion, StartupXLog would call: - WalSndWakeup(true, true) * XLogFlush/XLogBackgroundFlush/XLogWalRcvFlush would call: - WalSndWakeup(true, !RecoveryInProgress()) * ApplyWalRecord would call: - WalSndWakeup(switchedTLI, switchedTLI || RecoveryInProgress()) There seem to be two approaches to making that work: 1. Use two ConditionVariables, and WalSndWakeup would broadcast to one or both depending on its arguments. 2. Have a "replicaiton_kind" variable in WalSnd (either set based on MyDatabaseId==InvalidOid, or set at START_REPLICATION time) to indicate whether it's a physical or logical walsender. WalSndWakeup would wake up the right walsenders based on its arguments. #2 seems simpler at least for now. Would that work? Regards, Jeff Davis
Re: Minimal logical decoding on standbys
Hi, On 2023-04-02 15:15:44 -0700, Jeff Davis wrote: > On Sun, 2023-04-02 at 14:35 -0700, Andres Freund wrote: > > Why not offer a function to add a CV to a WES? It seems somehow odd > > to require > > going through condition_variable.c to create a WES. > > I agree that it's a bit odd, but remember that after waiting on a CV's > latch, it needs to re-insert itself into the CV's wait list. > > A WaitEventSetWait() can't do that, unless we move the details of re- > adding to the wait list into latch.c. I considered that, but latch.c > already implements the APIs for WaitEventSet and Latch, so it felt > complex to also make it responsible for ConditionVariable. I agree that the *wait* has to go through condition_variable.c, but it doesn't seem right that creation of the WES needs to go through condition_variable.c. The only thing that ConditionVariableEventSleep() seems to require is that the WES is waiting for MyLatch. You don't even need a separate WES for that, the already existing WES should suffice. Greetings, Andres Freund
Re: Minimal logical decoding on standbys
On Sun, 2023-04-02 at 14:35 -0700, Andres Freund wrote: > Why not offer a function to add a CV to a WES? It seems somehow odd > to require > going through condition_variable.c to create a WES. I agree that it's a bit odd, but remember that after waiting on a CV's latch, it needs to re-insert itself into the CV's wait list. A WaitEventSetWait() can't do that, unless we move the details of re- adding to the wait list into latch.c. I considered that, but latch.c already implements the APIs for WaitEventSet and Latch, so it felt complex to also make it responsible for ConditionVariable. I'm open to suggestion. Regards, Jeff Davis
Re: Minimal logical decoding on standbys
Hi, On 2023-03-31 02:44:33 -0700, Jeff Davis wrote: > From 2f05cab9012950ae9290fccbb6366d50fc01553e Mon Sep 17 00:00:00 2001 > From: Jeff Davis > Date: Wed, 1 Mar 2023 20:02:42 -0800 > Subject: [PATCH v2] Introduce ConditionVariableEventSleep(). > > The new API takes a WaitEventSet which can include socket events. The > WaitEventSet must have been created by > ConditionVariableWaitSetCreate(), another new function, so that it > includes the wait events necessary for a condition variable. Why not offer a function to add a CV to a WES? It seems somehow odd to require going through condition_variable.c to create a WES. Greetings, Andres Freund
Re: Minimal logical decoding on standbys
On Fri, 2023-03-31 at 02:44 -0700, Jeff Davis wrote: > Thank you, done. I think the nearby line was also wrong, returning > true > when there was no timeout. I combined the lines and got rid of the > early return so it can check the list and timeout condition like > normal. Attached. On second (third?) thought, I think I was right the first time. It passes the flag WL_EXIT_ON_PM_DEATH (included in the ConditionVariableWaitSet), so a WL_POSTMASTER_DEATH event should not be returned. Also, I think the early return is correct. The current code in ConditionVariableTimedSleep() still checks the wait list even if WaitLatch() returns WL_TIMEOUT (it ignores the return), but I don't see why it can't early return true. For a socket event in ConditionVariableEventSleep() I think it should early return false. Regards, Jeff Davis
Re: Minimal logical decoding on standbys
Hi, Btw, most of the patches have some things that pgindent will change (and some that my editor will highlight). It wouldn't hurt to run pgindent for the later patches... Pushed the WAL format change. On 2023-04-02 10:27:45 +0200, Drouvot, Bertrand wrote: > During WAL replay on standby, when slot conflict is identified, > invalidate such slots. Also do the same thing if wal_level on the primary > server > is reduced to below logical and there are existing logical slots > on standby. Introduce a new ProcSignalReason value for slot > conflict recovery. Arrange for a new pg_stat_database_conflicts field: > confl_active_logicalslot. > > Add a new field "conflicting" in pg_replication_slots. > > Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot > Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes > Mello, > Bharath Rupireddy > --- > doc/src/sgml/monitoring.sgml | 11 + > doc/src/sgml/system-views.sgml| 10 + > src/backend/access/gist/gistxlog.c| 2 + > src/backend/access/hash/hash_xlog.c | 1 + > src/backend/access/heap/heapam.c | 3 + > src/backend/access/nbtree/nbtxlog.c | 2 + > src/backend/access/spgist/spgxlog.c | 1 + > src/backend/access/transam/xlog.c | 20 +- > src/backend/catalog/system_views.sql | 6 +- > .../replication/logical/logicalfuncs.c| 13 +- > src/backend/replication/slot.c| 189 ++ > src/backend/replication/slotfuncs.c | 16 +- > src/backend/replication/walsender.c | 7 + > src/backend/storage/ipc/procsignal.c | 3 + > src/backend/storage/ipc/standby.c | 13 +- > src/backend/tcop/postgres.c | 28 +++ > src/backend/utils/activity/pgstat_database.c | 4 + > src/backend/utils/adt/pgstatfuncs.c | 3 + > src/include/catalog/pg_proc.dat | 11 +- > src/include/pgstat.h | 1 + > src/include/replication/slot.h| 14 +- > src/include/storage/procsignal.h | 1 + > src/include/storage/standby.h | 2 + > src/test/regress/expected/rules.out | 8 +- > 24 files changed, 308 insertions(+), 61 deletions(-) >5.3% doc/src/sgml/ >6.2% src/backend/access/transam/ >4.6% src/backend/replication/logical/ > 55.6% src/backend/replication/ >4.4% src/backend/storage/ipc/ >6.9% src/backend/tcop/ >5.3% src/backend/ >3.8% src/include/catalog/ >5.3% src/include/replication/ I think it might be worth trying to split this up a bit. > restart_lsn = s->data.restart_lsn; > - > - /* > - * If the slot is already invalid or is fresh enough, we don't > need to > - * do anything. > - */ > - if (XLogRecPtrIsInvalid(restart_lsn) || restart_lsn >= > oldestLSN) > + slot_xmin = s->data.xmin; > + slot_catalog_xmin = s->data.catalog_xmin; > + > + /* the slot has been invalidated (logical decoding conflict > case) */ > + if ((xid && ((LogicalReplicationSlotIsInvalid(s)) || > + /* or the xid is valid and this is a non conflicting slot */ > + (TransactionIdIsValid(*xid) && > !(LogicalReplicationSlotXidsConflict(slot_xmin, slot_catalog_xmin, *xid) > || > + /* or the slot has been invalidated (obsolete LSN case) */ > + (!xid && (XLogRecPtrIsInvalid(restart_lsn) || > restart_lsn >= oldestLSN))) > { This still looks nearly unreadable. I suggest moving comments outside of the if (), remove redundant parentheses, use a function to detect if the slot has been invalidated. > @@ -1329,16 +1345,45 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, > XLogRecPtr oldestLSN, >*/ > if (last_signaled_pid != active_pid) > { > + boolsend_signal = false; > + > + initStringInfo(&err_msg); > + initStringInfo(&err_detail); > + > + appendStringInfo(&err_msg, "terminating process > %d to release replication slot \"%s\"", > + active_pid, > + > NameStr(slotname)); For this to be translatable you need to use _("message"). > + if (xid) > + { > + appendStringInfo(&err_msg, " because it > conflicts with recovery"); > + send_signal = true; > + > + if (TransactionIdIsValid(*xid)) > +
Re: Minimal logical decoding on standbys
On Sun, 2023-04-02 at 10:11 +0200, Drouvot, Bertrand wrote: > I was thinking that, if a new LogicalWalSndWakeup() replaces > "ConditionVariableBroadcast(&XLogRecoveryCtl->replayedCV);" > in ApplyWalRecord() then, it could be possible that some walsender(s) > are requested to wake up while they are actually doing decoding (but > I might be wrong). I don't think that's a problem, right? We are concerned about wakeups when they happen repeatedly when there's no work to do, or when the wakeup doesn't happen when it should (and we need to wait for a timeout). > > > > Currently, we wake up walsenders only after writing some WAL > > records > > at the time of flush, so won't it be better to wake up only after > > applying some WAL records rather than after applying each record? > > Yeah that would be better. Why? If the walsender is asleep, and there's work to be done, why not wake it up? If it's already doing work, and the latch gets repeatedly set, that doesn't look like a problem either. The comment on SetLatch() says: /* * Sets a latch and wakes up anyone waiting on it. * * This is cheap if the latch is already set, otherwise not so much. Regards, Jeff Davis
Re: Minimal logical decoding on standbys
Hi, On 4/1/23 6:50 AM, Amit Kapila wrote: On Fri, Mar 31, 2023 at 7:14 PM Drouvot, Bertrand wrote: That sounds like a good idea. We could imagine creating a LogicalWalSndWakeup() doing the Walsender(s) triage based on a new variable (as you suggest). But, it looks to me that we: - would need to go through the list of all the walsenders to do the triage - could wake up some logical walsender(s) unnecessary Why it could wake up unnecessarily? I was thinking that, if a new LogicalWalSndWakeup() replaces "ConditionVariableBroadcast(&XLogRecoveryCtl->replayedCV);" in ApplyWalRecord() then, it could be possible that some walsender(s) are requested to wake up while they are actually doing decoding (but I might be wrong). This extra work would occur during each replay. while with the CV, only the ones in the CV wait queue would be waked up. Currently, we wake up walsenders only after writing some WAL records at the time of flush, so won't it be better to wake up only after applying some WAL records rather than after applying each record? Yeah that would be better. Do you have any idea about how (and where) we could define the "some WAL records replayed"? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
Hi, On 2023-03-31 12:45:51 +0200, Drouvot, Bertrand wrote: > On 3/31/23 6:33 AM, Andres Freund wrote: > > Hi, > > > > On 2023-03-30 18:23:41 +0200, Drouvot, Bertrand wrote: > > > On 3/30/23 9:04 AM, Andres Freund wrote: > > > > I think this commit is ready to go. Unless somebody thinks differently, > > > > I > > > > think I might push it tomorrow. > > > > > > Great! Once done, I'll submit a new patch so that GlobalVisTestFor() can > > > make > > > use of the heap relation in vacuumRedirectAndPlaceholder() (which will be > > > possible > > > once 0001 is committed). > > > > Unfortunately I did find an issue doing a pre-commit review of the patch. > > > > The patch adds VISIBILITYMAP_IS_CATALOG_REL to xl_heap_visible.flags - but > > it > > does not remove the bit before calling visibilitymap_set(). > > > > This ends up corrupting the visibilitymap, because the we'll set a bit for > > another page. > > > > Oh I see, I did not think about that (not enough experience in the VM area). > Nice catch and thanks for pointing out! I pushed a commit just adding an assertion that only valid bits are passed in. > > I'm also thinking of splitting the patch into two. One patch to pass down > > the > > heap relation into the new places, and another for the rest. > > I think that makes sense. I don't know how far you've work on the split but > please > find attached V54 doing such a split + implementing your > VISIBILITYMAP_XLOG_VALID_BITS > suggestion. I pushed the pass-the-relation part. I removed an include of catalog.h that was in the patch - I suspect it might have slipped in there from a later patch in the series... I was a bit bothered by using 'heap' instead of 'table' in so many places (eventually we imo should standardize on the latter), but looking around the changed places, heap was used for things like buffers etc. So I left it at heap. Glad we split 0001 - the rest is a lot easier to review. Greetings, Andres Freund