Re: START_REPLICATION SLOT causing a crash in an assert build
On Thu, Oct 20, 2022 at 6:54 AM Andres Freund wrote: > > Hi, > > On 2022-10-13 15:57:28 +0900, Masahiko Sawada wrote: > > I've attached an updated patch. I've added the common function to > > start pg_recvlogical and wait for it to become active. Please review > > it. > > > +# Start pg_recvlogical process and wait for it to become active. > > +sub start_pg_recvlogical > > +{ > > + my ($node, $slot_name, $create_slot) = @_; > > + > > + my @cmd = ( > > + 'pg_recvlogical', '-S', "$slot_name", '-d', > > + $node->connstr('postgres'), > > + '--start', '--no-loop', '-f', '-'); > > + push @cmd, '--create-slot' if $create_slot; > > + > > + # start pg_recvlogical process. > > + my $pg_recvlogical = IPC::Run::start(@cmd); > > + > > + # Wait for the replication slot to become active. > > + $node->poll_query_until('postgres', > > + "SELECT EXISTS (SELECT 1 FROM pg_replication_slots WHERE > > slot_name = '$slot_name' AND active_pid IS NOT NULL)" > > + ) or die "slot never became active"; > > + > > + return $pg_recvlogical; > > +} > > Because you never process the output from pg_recvlogical I think this test > will fail if the pipe buffer is small (or more changes are made). I think > either it needs to output to a file, or we need to process the output. Okay, but how can we test this situation? As far as I tested, if we don't specify the redirection of pg_recvlogical's output as above, pg_recvlogical's stdout and stderr are output to the log file. So I could not reproduce the issue you're concerned about. Which pipe do you refer to? > > A file seems the easier solution in this case, we don't really care about what > changes are streamed to the client, right? > > > > +$node = PostgreSQL::Test::Cluster->new('test2'); > > +$node->init(allows_streaming => 'logical'); > > +$node->start; > > +$node->safe_psql('postgres', "CREATE TABLE test(i int)"); > > Why are we creating a new cluster? Initdbs takes a fair bit of time on some > platforms, so this seems unnecessary? Agreed. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: START_REPLICATION SLOT causing a crash in an assert build
Hi, On 2022-10-13 15:57:28 +0900, Masahiko Sawada wrote: > I've attached an updated patch. I've added the common function to > start pg_recvlogical and wait for it to become active. Please review > it. > +# Start pg_recvlogical process and wait for it to become active. > +sub start_pg_recvlogical > +{ > + my ($node, $slot_name, $create_slot) = @_; > + > + my @cmd = ( > + 'pg_recvlogical', '-S', "$slot_name", '-d', > + $node->connstr('postgres'), > + '--start', '--no-loop', '-f', '-'); > + push @cmd, '--create-slot' if $create_slot; > + > + # start pg_recvlogical process. > + my $pg_recvlogical = IPC::Run::start(@cmd); > + > + # Wait for the replication slot to become active. > + $node->poll_query_until('postgres', > + "SELECT EXISTS (SELECT 1 FROM pg_replication_slots WHERE > slot_name = '$slot_name' AND active_pid IS NOT NULL)" > + ) or die "slot never became active"; > + > + return $pg_recvlogical; > +} Because you never process the output from pg_recvlogical I think this test will fail if the pipe buffer is small (or more changes are made). I think either it needs to output to a file, or we need to process the output. A file seems the easier solution in this case, we don't really care about what changes are streamed to the client, right? > +$node = PostgreSQL::Test::Cluster->new('test2'); > +$node->init(allows_streaming => 'logical'); > +$node->start; > +$node->safe_psql('postgres', "CREATE TABLE test(i int)"); Why are we creating a new cluster? Initdbs takes a fair bit of time on some platforms, so this seems unnecessary? Greetings, Andres Freund
Re: START_REPLICATION SLOT causing a crash in an assert build
On Thu, Oct 13, 2022 at 1:21 AM Andres Freund wrote: > > Hi, > > On 2022-10-11 17:10:52 +0900, Masahiko Sawada wrote: > > +# Reset the replication slot statistics. > > +$node->safe_psql('postgres', > > + "SELECT pg_stat_reset_replication_slot('regression_slot');"); > > +my $result = $node->safe_psql('postgres', > > + "SELECT * FROM pg_stat_replication_slots WHERE slot_name = > > 'regrssion_slot'" > > +); > > Typo in the slot name "regrssion_slot" instead of "regression_slot". We can't > use * here, because that'll include the reset timestamp. Fixed. > > > > +# Teardown the node so the statistics is removed. > > +$pg_recvlogical->kill_kill; > > +$node->teardown_node; > > +$node->start; > > ISTM that removing the file instead of shutting down the cluster with force > would make it a more targeted test. Agreed. > > > > +# Check if the replication slot statistics have been removed. > > +$result = $node->safe_psql('postgres', > > + "SELECT * FROM pg_stat_replication_slots WHERE slot_name = > > 'regrssion_slot'" > > +); > > +is($result, "", "replication slot statistics are removed"); > > Same typo as above. We can't assert a specific result here either, because > recvlogical will have processed a bunch of changes. Perhaps we could check at > least that the reset time is NULL? Agreed. > > > > +# Test if the replication slot staistics continue to be accumulated even > > after > > s/staistics/statistics/ Fixed. I've attached an updated patch. I've added the common function to start pg_recvlogical and wait for it to become active. Please review it. Regards, -- Masahiko Sawada PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com regression_test_for_replslot_stats_v2.patch Description: Binary data
Re: START_REPLICATION SLOT causing a crash in an assert build
Hi, On 2022-10-11 17:10:52 +0900, Masahiko Sawada wrote: > +# Reset the replication slot statistics. > +$node->safe_psql('postgres', > + "SELECT pg_stat_reset_replication_slot('regression_slot');"); > +my $result = $node->safe_psql('postgres', > + "SELECT * FROM pg_stat_replication_slots WHERE slot_name = > 'regrssion_slot'" > +); Typo in the slot name "regrssion_slot" instead of "regression_slot". We can't use * here, because that'll include the reset timestamp. > +# Teardown the node so the statistics is removed. > +$pg_recvlogical->kill_kill; > +$node->teardown_node; > +$node->start; ISTM that removing the file instead of shutting down the cluster with force would make it a more targeted test. > +# Check if the replication slot statistics have been removed. > +$result = $node->safe_psql('postgres', > + "SELECT * FROM pg_stat_replication_slots WHERE slot_name = > 'regrssion_slot'" > +); > +is($result, "", "replication slot statistics are removed"); Same typo as above. We can't assert a specific result here either, because recvlogical will have processed a bunch of changes. Perhaps we could check at least that the reset time is NULL? > +# Test if the replication slot staistics continue to be accumulated even > after s/staistics/statistics/ Greetings, Andres Freund
Re: START_REPLICATION SLOT causing a crash in an assert build
On Sun, Oct 9, 2022 at 2:42 AM Andres Freund wrote: > > On 2022-10-08 09:53:50 -0700, Andres Freund wrote: > > On 2022-10-07 19:56:33 -0700, Andres Freund wrote: > > > I'm planning to push this either later tonight (if I feel up to it after > > > cooking dinner) or tomorrow morning PST, due to the release wrap deadline. > > > > I looked this over again, tested a bit more, and pushed the adjusted 15 and > > master versions to github to get a CI run. Once that passes, as I expect, > > I'll > > push them for real. > > Those passed and thus pushed. > > Thanks for the report, debugging and review everyone! Thanks! > > > I think we need at least the following tests for replslots: > - a reset while decoding is ongoing works correctly > - replslot stats continue to be accumulated after stats have been removed > > > I wonder how much it'd take to teach isolationtester to handle the replication > protocol... I think we can do these tests by using pg_recvlogical in TAP tests. I've attached a patch to do that. Regards, -- Masahiko Sawada PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com regression_test_for_replslot_stats.patch Description: Binary data
Re: START_REPLICATION SLOT causing a crash in an assert build
On 10/8/22 1:40 PM, Andres Freund wrote: On 2022-10-08 09:53:50 -0700, Andres Freund wrote: On 2022-10-07 19:56:33 -0700, Andres Freund wrote: I'm planning to push this either later tonight (if I feel up to it after cooking dinner) or tomorrow morning PST, due to the release wrap deadline. I looked this over again, tested a bit more, and pushed the adjusted 15 and master versions to github to get a CI run. Once that passes, as I expect, I'll push them for real. Those passed and thus pushed. Thanks for the report, debugging and review everyone! Thanks for the quick turnaround! I've closed the open item. Thanks, Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: START_REPLICATION SLOT causing a crash in an assert build
On 2022-10-08 09:53:50 -0700, Andres Freund wrote: > On 2022-10-07 19:56:33 -0700, Andres Freund wrote: > > I'm planning to push this either later tonight (if I feel up to it after > > cooking dinner) or tomorrow morning PST, due to the release wrap deadline. > > I looked this over again, tested a bit more, and pushed the adjusted 15 and > master versions to github to get a CI run. Once that passes, as I expect, I'll > push them for real. Those passed and thus pushed. Thanks for the report, debugging and review everyone! I think we need at least the following tests for replslots: - a reset while decoding is ongoing works correctly - replslot stats continue to be accumulated after stats have been removed I wonder how much it'd take to teach isolationtester to handle the replication protocol...
Re: START_REPLICATION SLOT causing a crash in an assert build
Hi, On 2022-10-07 19:56:33 -0700, Andres Freund wrote: > I'm planning to push this either later tonight (if I feel up to it after > cooking dinner) or tomorrow morning PST, due to the release wrap deadline. I looked this over again, tested a bit more, and pushed the adjusted 15 and master versions to github to get a CI run. Once that passes, as I expect, I'll push them for real. Greetings, Andres Freund
Re: START_REPLICATION SLOT causing a crash in an assert build
Hi, On 2022-10-07 12:00:56 -0700, Andres Freund wrote: > On 2022-10-07 15:30:43 +0900, Kyotaro Horiguchi wrote: > > The key point of this is this: > > > > +* XXX: I think there cannot actually be data from an older slot > > +* here. After a crash we throw away the old stats data and if a slot is > > +* dropped while shut down, we'll not load the slot data at startup. > > > > I think this is true. Assuming that we don't recreate or rename > > objects that have stats after writing out stats, we won't have stats > > for a different object with the same name. > > Thanks for thinking through this! > > If we can rely on that fact, the existing check in pgstat_acquire_replslot() > > becomes useless. Thus we don't need to store object name in stats entry. I > > agree to that. > > I don't agree with this aspect. I think it's better to ensure that the stats > object exists when acquiring a slot, rather than later, when reporting. It's a > lot simpler to think through the lock nesting etc that way. > > I'd also eventually like to remove the stats that are currently kept > separately in ReorderBuffer, and that will be easier/cheaper if we can rely on > the stats objects to have already been created. Here's a cleaned up version of my earlier prototype. - I wrapped the access to ReplicationSlotCtl->replication_slots[i].data.name in a new function bool ReplicationSlotName(index, name). I'm not entirely happy with that name, as it sounds like a more general accessor than it is - I toyed with ReplicationSlotNameForIndex(), but that seemed somewhat unnecessary, I don't forsee a need for another name accessor. Anyone wants to weigh in? - Substantial comment and commit message polishing - I'm planning to drop PgStat_StatReplSlotEntry.slotname and a PGSTAT_FILE_FORMAT_ID bump in master and to rename slotname to slotname_unused in 15. - Self-review found a bug, I copy-pasted create=false in the call to pgstat_get_entry_ref() in pgstat_acquire_replslot(). This did *NOT* cause any test failures - clearly our test coverage in this area is woefully inadequate. - This patch does not contain a test for the fix. I think this can only be tested by a tap test starting pg_recvlogical in the background and checking pg_recvlogical's output. That type of test is notoriously hard to be reliable, so committing it shortly before the release is wrapped seems like a bad idea. I manually verified that: - a continually active slot reporting stats after pgstat_reset_replslot() works correctly (this is what crashed before) - replslot stats reporting works correctly after stats were removed - upgrading from pre-fix to post-fix preserves stats (when keeping slotname / not increasing the stats version, of course) I'm planning to push this either later tonight (if I feel up to it after cooking dinner) or tomorrow morning PST, due to the release wrap deadline. Greetings, Andres Freund >From 4c13dc7657fb7b9a853055b693564706847b7841 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Fri, 7 Oct 2022 19:37:48 -0700 Subject: [PATCH v4] pgstat: Prevent stats reset from corrupting slotname by removing slotname Previously PgStat_StatReplSlotEntry contained the slotname, which was mainly used when writing out the stats during shutdown, to identify the slot in the serialized data (at runtime the index in ReplicationSlotCtl->replication_slots is used, but that can change during a restart). Unfortunately the slotname was overwritten when the slot's stats were reset. That turned out to only cause "real" problems if the slot was active during the reset, triggering an assertion failure at the next pgstat_report_replslot(). In other paths the stats were re-initialized during pgstat_acquire_replslot(). Fix this by removing slotname. Instead we can get the slot's name from the slot itself. Besides fixing a bug, this also is architecturally cleaner (a name is not really statistics). This is safe because stats, for a slot removed while shut down, will not be restored at startup. In 15 the slotname is not removed, but renamed, to avoid changing the stats format. In master, bump PGSTAT_FILE_FORMAT_ID. This commit does not contain a test for the fix. I think this can only be tested by a tap test starting pg_recvlogical in the background and checking pg_recvlogical's output. That type of test is notoriously hard to be reliable, so committing it shortly before the release is wrapped seems like a bad idea. Reported-by: Jaime Casanova Author: Andres Freund Reviewed-by: Masahiko Sawada Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/YxfagaTXUNa9ggLb@ahch-to Backpatch: 15-, where the bug was introduced in 5891c7a8ed8f --- src/include/pgstat.h | 8 ++- src/include/replication/slot.h | 1 + src/include/utils/pgstat_internal.h | 5 +- src/backend/replication/slot.c | 28 ++ src/backend/utils/activity/pgstat.c | 2
Re: START_REPLICATION SLOT causing a crash in an assert build
Hi, On 2022-10-07 15:30:43 +0900, Kyotaro Horiguchi wrote: > At Fri, 7 Oct 2022 12:14:40 +0900, Masahiko Sawada > wrote in > > > What about if we go the other direction - simply remove the name from the > > > stats entry at all. I don't actually think we need it anymore. Unless I am > > > missing something right now - entirely possible! - the danger that > > > pgstat_acquire_replslot() mentions doesn't actually exist [anymore]. > > > After a > > > crash we throw away the old stats data and if a slot is dropped while shut > > > down, we'll not load the slot data at startup. > > The key point of this is this: > > + * XXX: I think there cannot actually be data from an older slot > + * here. After a crash we throw away the old stats data and if a slot is > + * dropped while shut down, we'll not load the slot data at startup. > > I think this is true. Assuming that we don't recreate or rename > objects that have stats after writing out stats, we won't have stats > for a different object with the same name. Thanks for thinking through this! > If we can rely on that fact, the existing check in pgstat_acquire_replslot() > becomes useless. Thus we don't need to store object name in stats entry. I > agree to that. I don't agree with this aspect. I think it's better to ensure that the stats object exists when acquiring a slot, rather than later, when reporting. It's a lot simpler to think through the lock nesting etc that way. I'd also eventually like to remove the stats that are currently kept separately in ReorderBuffer, and that will be easier/cheaper if we can rely on the stats objects to have already been created. Greetings, Andres Freund
Re: START_REPLICATION SLOT causing a crash in an assert build
At Fri, 7 Oct 2022 12:14:40 +0900, Masahiko Sawada wrote in > > What about if we go the other direction - simply remove the name from the > > stats entry at all. I don't actually think we need it anymore. Unless I am > > missing something right now - entirely possible! - the danger that > > pgstat_acquire_replslot() mentions doesn't actually exist [anymore]. After a > > crash we throw away the old stats data and if a slot is dropped while shut > > down, we'll not load the slot data at startup. The key point of this is this: +* XXX: I think there cannot actually be data from an older slot +* here. After a crash we throw away the old stats data and if a slot is +* dropped while shut down, we'll not load the slot data at startup. I think this is true. Assuming that we don't recreate or rename objects that have stats after writing out stats, we won't have stats for a different object with the same name. If we can rely on that fact, the existing check in pgstat_acquire_replslot() becomes useless. Thus we don't need to store object name in stats entry. I agree to that. > +1. I think it works. Since the replication slot index doesn't change > during server running we can fetch the name from > ReplicationSlotCtl->replication_slots. That access seems safe in a bit different aspect, too. Both checkpointer (and walsender) properly initialize ReplicationSlotCtl. > If we don't need the name in stats entry, pgstat_acquire_replslot() is > no longer necessary? I think so. The entry will be created at the first report. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: START_REPLICATION SLOT causing a crash in an assert build
On Fri, Oct 7, 2022 at 8:00 AM Andres Freund wrote: > > Hi, > > On 2022-10-06 14:10:46 +0900, Kyotaro Horiguchi wrote: > > +1. FWIW, the atttached is an example of what it looks like if we > > avoid file format change. > > What about if we go the other direction - simply remove the name from the > stats entry at all. I don't actually think we need it anymore. Unless I am > missing something right now - entirely possible! - the danger that > pgstat_acquire_replslot() mentions doesn't actually exist [anymore]. After a > crash we throw away the old stats data and if a slot is dropped while shut > down, we'll not load the slot data at startup. +1. I think it works. Since the replication slot index doesn't change during server running we can fetch the name from ReplicationSlotCtl->replication_slots. If we don't need the name in stats entry, pgstat_acquire_replslot() is no longer necessary? Regards, -- Masahiko Sawada PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: START_REPLICATION SLOT causing a crash in an assert build
Hi, On 2022-10-06 14:10:46 +0900, Kyotaro Horiguchi wrote: > +1. FWIW, the atttached is an example of what it looks like if we > avoid file format change. What about if we go the other direction - simply remove the name from the stats entry at all. I don't actually think we need it anymore. Unless I am missing something right now - entirely possible! - the danger that pgstat_acquire_replslot() mentions doesn't actually exist [anymore]. After a crash we throw away the old stats data and if a slot is dropped while shut down, we'll not load the slot data at startup. The attached is a rough prototype, but should be enough for Jaime to test and Horiguchi to test / review / think about. Amit, I CCed you, since you've thought a bunch about the dangers in this area too. Greetings, Andres Freund >From f34c6ed510fbe4c079a0c80e9040cac77c60fd19 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Thu, 6 Oct 2022 15:50:36 -0700 Subject: [PATCH v3] pgstat: Remove slotname from PgStat_StatReplSlotEntry This fixes a bug where the slotname is reset as part of pgstat_reset_replslot() subsequently causing an allocation failure in pgstat_report_replslot(). It also is architecturally nicer, because having the name as part of the stats isn't quite right, on account of not actually being stats :) As far as I can tell we actually have worked around all the dangers that lead us to keeping the stats name part of the stats. Author: Reviewed-by: Discussion: https://postgr.es/m/ Backpatch: --- src/include/pgstat.h | 2 ++ src/include/utils/pgstat_internal.h | 5 +-- src/backend/utils/activity/pgstat.c | 2 +- src/backend/utils/activity/pgstat_replslot.c | 36 +--- 4 files changed, 14 insertions(+), 31 deletions(-) diff --git a/src/include/pgstat.h b/src/include/pgstat.h index ad7334a0d21..2a85fa0369a 100644 --- a/src/include/pgstat.h +++ b/src/include/pgstat.h @@ -321,7 +321,9 @@ typedef struct PgStat_StatFuncEntry typedef struct PgStat_StatReplSlotEntry { +#if IN_PG_15_ONLY_UNUSED NameData slotname; +#endif PgStat_Counter spill_txns; PgStat_Counter spill_count; PgStat_Counter spill_bytes; diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h index 40a36028554..627c1389e4c 100644 --- a/src/include/utils/pgstat_internal.h +++ b/src/include/utils/pgstat_internal.h @@ -242,7 +242,8 @@ typedef struct PgStat_KindInfo /* * For variable-numbered stats with named_on_disk. Optional. */ - void (*to_serialized_name) (const PgStatShared_Common *header, NameData *name); + void (*to_serialized_name) (const PgStat_HashKey *key, + const PgStatShared_Common *header, NameData *name); bool (*from_serialized_name) (const NameData *name, PgStat_HashKey *key); /* @@ -567,7 +568,7 @@ extern void pgstat_relation_delete_pending_cb(PgStat_EntryRef *entry_ref); */ extern void pgstat_replslot_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts); -extern void pgstat_replslot_to_serialized_name_cb(const PgStatShared_Common *header, NameData *name); +extern void pgstat_replslot_to_serialized_name_cb(const PgStat_HashKey *key, const PgStatShared_Common *header, NameData *name); extern bool pgstat_replslot_from_serialized_name_cb(const NameData *name, PgStat_HashKey *key); diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c index 609f0b1ad86..1b97597f17c 100644 --- a/src/backend/utils/activity/pgstat.c +++ b/src/backend/utils/activity/pgstat.c @@ -1367,7 +1367,7 @@ pgstat_write_statsfile(void) /* stats entry identified by name on disk (e.g. slots) */ NameData name; - kind_info->to_serialized_name(shstats, &name); + kind_info->to_serialized_name(&ps->key, shstats, &name); fputc('N', fpout); write_chunk_s(fpout, &ps->key.kind); diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c index 2039ac3147f..793f26fc950 100644 --- a/src/backend/utils/activity/pgstat_replslot.c +++ b/src/backend/utils/activity/pgstat_replslot.c @@ -82,12 +82,6 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats; statent = &shstatent->stats; - /* - * Any mismatch should have been fixed in pgstat_create_replslot() or - * pgstat_acquire_replslot(). - */ - Assert(namestrcmp(&statent->slotname, NameStr(slot->data.name)) == 0); - /* Update the replication slot statistics */ #define REPLSLOT_ACC(fld) statent->fld += repSlotStat->fld REPLSLOT_ACC(spill_txns); @@ -124,7 +118,6 @@ pgstat_create_replslot(ReplicationSlot *slot) * if we previously crashed after dropping a slot. */ memset(&shstatent->stats, 0, sizeof(shstatent->stats)); - namestrcpy(&shstatent->stats.slotname, NameStr(slot->data.name)); pgstat_unlock_entry(entry_ref); } @@ -135,27 +128,13 @@ pgstat_create_replslot(ReplicationSlot *slot) void pgstat_acq
Re: START_REPLICATION SLOT causing a crash in an assert build
On 10/6/22 1:10 AM, Kyotaro Horiguchi wrote: At Thu, 6 Oct 2022 13:44:43 +0900, Michael Paquier wrote in On Wed, Oct 05, 2022 at 11:24:57PM -0400, Jonathan S. Katz wrote: On 10/5/22 8:44 PM, Andres Freund wrote: I have two ideas how to fix it. As a design constraint, I'd be interested in the RMTs opinion on the following: Is a cleaner fix that changes the stats format (i.e. existing stats will be thrown away when upgrading) or one that doesn't change the stats format preferrable? [My opinion, will let Michael/John chime in] Ideally we would keep the stats on upgrade -- I think that's the better user experience. The release has not happened yet, so I would be fine to remain flexible and bump again PGSTAT_FILE_FORMAT_ID so as we have the cleanest approach in place for the release and the future. Yes, I agree with this. At the end, we would throw automatically the data of a file that's marked with a version that does not match with what we expect at load time, so there's a limited impact on the user experience, except, well, losing these stats, of course. I'm fine with this. +1. FWIW, the atttached is an example of what it looks like if we avoid file format change. Thanks for the quick turnaround. I'll let others chime in on the review. Thanks, Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: START_REPLICATION SLOT causing a crash in an assert build
At Thu, 6 Oct 2022 13:44:43 +0900, Michael Paquier wrote in > On Wed, Oct 05, 2022 at 11:24:57PM -0400, Jonathan S. Katz wrote: > > On 10/5/22 8:44 PM, Andres Freund wrote: > >> I have two ideas how to fix it. As a design constraint, I'd be interested > >> in > >> the RMTs opinion on the following: > >> Is a cleaner fix that changes the stats format (i.e. existing stats will be > >> thrown away when upgrading) or one that doesn't change the stats format > >> preferrable? > > > > [My opinion, will let Michael/John chime in] > > > > Ideally we would keep the stats on upgrade -- I think that's the better user > > experience. > > The release has not happened yet, so I would be fine to remain > flexible and bump again PGSTAT_FILE_FORMAT_ID so as we have the > cleanest approach in place for the release and the future. At the > end, we would throw automatically the data of a file that's marked > with a version that does not match with what we expect at load time, > so there's a limited impact on the user experience, except, well, > losing these stats, of course. +1. FWIW, the atttached is an example of what it looks like if we avoid file format change. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 4effe9279d6535191d7f5478bb76490dc8762d33 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 6 Oct 2022 14:06:04 +0900 Subject: [PATCH v2] If we avoid file format --- src/backend/utils/activity/pgstat.c | 9 +++-- src/backend/utils/activity/pgstat_replslot.c | 18 -- src/include/pgstat.h | 1 - src/include/utils/pgstat_internal.h | 3 +++ 4 files changed, 22 insertions(+), 9 deletions(-) diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c index 609f0b1ad8..a03184c9ef 100644 --- a/src/backend/utils/activity/pgstat.c +++ b/src/backend/utils/activity/pgstat.c @@ -310,6 +310,7 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = { .reset_timestamp_cb = pgstat_replslot_reset_timestamp_cb, .to_serialized_name = pgstat_replslot_to_serialized_name_cb, + .set_name = pgstat_replslot_set_name_cb, .from_serialized_name = pgstat_replslot_from_serialized_name_cb, }, @@ -1523,6 +1524,8 @@ pgstat_read_statsfile(void) PgStat_HashKey key; PgStatShared_HashEntry *p; PgStatShared_Common *header; + const PgStat_KindInfo *kind_info = NULL; + NameData name = {0}; CHECK_FOR_INTERRUPTS(); @@ -1538,9 +1541,7 @@ pgstat_read_statsfile(void) else { /* stats entry identified by name on disk (e.g. slots) */ - const PgStat_KindInfo *kind_info = NULL; PgStat_Kind kind; - NameData name; if (!read_chunk_s(fpin, &kind)) goto error; @@ -1590,6 +1591,10 @@ pgstat_read_statsfile(void) pgstat_get_entry_len(key.kind))) goto error; + /* set name if it requires to do that separately */ + if (kind_info && kind_info->set_name && NameStr(name)[0]) + kind_info->set_name(header, &name); + break; } case 'E': diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c index 2039ac3147..9fa4df610d 100644 --- a/src/backend/utils/activity/pgstat_replslot.c +++ b/src/backend/utils/activity/pgstat_replslot.c @@ -86,7 +86,7 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re * Any mismatch should have been fixed in pgstat_create_replslot() or * pgstat_acquire_replslot(). */ - Assert(namestrcmp(&statent->slotname, NameStr(slot->data.name)) == 0); + Assert(namestrcmp(&shstatent->slotname, NameStr(slot->data.name)) == 0); /* Update the replication slot statistics */ #define REPLSLOT_ACC(fld) statent->fld += repSlotStat->fld @@ -124,7 +124,7 @@ pgstat_create_replslot(ReplicationSlot *slot) * if we previously crashed after dropping a slot. */ memset(&shstatent->stats, 0, sizeof(shstatent->stats)); - namestrcpy(&shstatent->stats.slotname, NameStr(slot->data.name)); + namestrcpy(&shstatent->slotname, NameStr(slot->data.name)); pgstat_unlock_entry(entry_ref); } @@ -148,11 +148,11 @@ pgstat_acquire_replslot(ReplicationSlot *slot) * NB: need to accept that there might be stats from an older slot, e.g. * if we previously crashed after dropping a slot. */ - if (NameStr(statent->slotname)[0] == 0 || - namestrcmp(&statent->slotname, NameStr(slot->data.name)) != 0) + if (NameStr(shstatent->slotname)[0] == 0 || + namestrcmp(&shstatent->slotname, NameStr(slot->data.name)) != 0) { memset(statent, 0, sizeof(*statent)); - namestrcpy(&statent->slotname, NameStr(slot->data.name)); + namestrcpy(&shstatent->slotname, NameStr(slot->data.name)); } pgstat_unlock_entry(entry_ref); @@ -187,7 +187,13 @@ pgstat_fetch_replslot(NameData slotname) void pgstat_replslot_to_serialized_name_cb(const PgStatShared_Common *header, NameData *name) { - namestrcpy(name, Name
Re: START_REPLICATION SLOT causing a crash in an assert build
On Wed, Oct 05, 2022 at 11:24:57PM -0400, Jonathan S. Katz wrote: > On 10/5/22 8:44 PM, Andres Freund wrote: >> I have two ideas how to fix it. As a design constraint, I'd be interested in >> the RMTs opinion on the following: >> Is a cleaner fix that changes the stats format (i.e. existing stats will be >> thrown away when upgrading) or one that doesn't change the stats format >> preferrable? > > [My opinion, will let Michael/John chime in] > > Ideally we would keep the stats on upgrade -- I think that's the better user > experience. The release has not happened yet, so I would be fine to remain flexible and bump again PGSTAT_FILE_FORMAT_ID so as we have the cleanest approach in place for the release and the future. At the end, we would throw automatically the data of a file that's marked with a version that does not match with what we expect at load time, so there's a limited impact on the user experience, except, well, losing these stats, of course. -- Michael signature.asc Description: PGP signature
Re: START_REPLICATION SLOT causing a crash in an assert build
On 10/5/22 8:44 PM, Andres Freund wrote: Hi, On 2022-10-05 13:00:53 -0400, Jonathan S. Katz wrote: On 9/27/22 1:52 AM, Kyotaro Horiguchi wrote: Thanks! At Mon, 26 Sep 2022 19:53:02 -0700, Andres Freund wrote in I wonder if the correct fix here wouldn't be to move the slotname out of PgStat_StatReplSlotEntry? Ugh. Right. I thought its outer struct as purely the part for the common header. But we can freely place anything after the header part. I moved it to the outer struct. I didn't clear that part in pgstat_create_relation() because it is filled in immediately. The attached is that. This is still listed as an open item[1] for v15. Does this fix proposed address the issue? Unfortunately not - it doesn't even pass the existing tests (test_decoding/001_repl_stats fails) :(. Thanks for checking. The reason for that is that with the patch nothing restores the slotname when reading stats from disk. That turns out not to cause immediate issues, but at the next shutdown the name won't be set, and we'll serialize the stats data with an empty string as the name. Ah. I have two ideas how to fix it. As a design constraint, I'd be interested in the RMTs opinion on the following: Is a cleaner fix that changes the stats format (i.e. existing stats will be thrown away when upgrading) or one that doesn't change the stats format preferrable? [My opinion, will let Michael/John chime in] Ideally we would keep the stats on upgrade -- I think that's the better user experience. Thanks, Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: START_REPLICATION SLOT causing a crash in an assert build
Hi, On 2022-10-05 13:00:53 -0400, Jonathan S. Katz wrote: > On 9/27/22 1:52 AM, Kyotaro Horiguchi wrote: > > Thanks! > > > > At Mon, 26 Sep 2022 19:53:02 -0700, Andres Freund > > wrote in > > > I wonder if the correct fix here wouldn't be to move the slotname out of > > > PgStat_StatReplSlotEntry? > > > > Ugh. Right. I thought its outer struct as purely the part for the > > common header. But we can freely place anything after the header > > part. I moved it to the outer struct. I didn't clear that part in > > pgstat_create_relation() because it is filled in immediately. > > > > The attached is that. > > This is still listed as an open item[1] for v15. Does this fix proposed > address the issue? Unfortunately not - it doesn't even pass the existing tests (test_decoding/001_repl_stats fails) :(. The reason for that is that with the patch nothing restores the slotname when reading stats from disk. That turns out not to cause immediate issues, but at the next shutdown the name won't be set, and we'll serialize the stats data with an empty string as the name. I have two ideas how to fix it. As a design constraint, I'd be interested in the RMTs opinion on the following: Is a cleaner fix that changes the stats format (i.e. existing stats will be thrown away when upgrading) or one that doesn't change the stats format preferrable? Greetings, Andres Freund
Re: START_REPLICATION SLOT causing a crash in an assert build
On 9/27/22 1:52 AM, Kyotaro Horiguchi wrote: Thanks! At Mon, 26 Sep 2022 19:53:02 -0700, Andres Freund wrote in I wonder if the correct fix here wouldn't be to move the slotname out of PgStat_StatReplSlotEntry? Ugh. Right. I thought its outer struct as purely the part for the common header. But we can freely place anything after the header part. I moved it to the outer struct. I didn't clear that part in pgstat_create_relation() because it is filled in immediately. The attached is that. This is still listed as an open item[1] for v15. Does this fix proposed address the issue? Thanks, Jonathan [1] https://wiki.postgresql.org/wiki/PostgreSQL_15_Open_Items OpenPGP_signature Description: OpenPGP digital signature
Re: START_REPLICATION SLOT causing a crash in an assert build
Thanks! At Mon, 26 Sep 2022 19:53:02 -0700, Andres Freund wrote in > I wonder if the correct fix here wouldn't be to move the slotname out of > PgStat_StatReplSlotEntry? Ugh. Right. I thought its outer struct as purely the part for the common header. But we can freely place anything after the header part. I moved it to the outer struct. I didn't clear that part in pgstat_create_relation() because it is filled in immediately. The attached is that. > On 2022-09-16 14:37:17 +0900, Kyotaro Horiguchi wrote: ... > > @@ -307,6 +313,10 @@ static const PgStat_KindInfo > > pgstat_kind_infos[PGSTAT_NUM_KINDS] = { > > .shared_size = sizeof(PgStatShared_ReplSlot), > > .shared_data_off = offsetof(PgStatShared_ReplSlot, stats), > > .shared_data_len = sizeof(((PgStatShared_ReplSlot *) 0)->stats), > > + /* reset doesn't wipe off slot name */ > > + .reset_off = offsetof(PgStat_StatReplSlotEntry, spill_txns), > > + .reset_len = sizeof(((PgStatShared_ReplSlot *) 0)->stats), > > + offsetof(PgStat_StatReplSlotEntry, spill_txns), > > I'm confused what this offsetof does here? It's not even assigned to a > specific field? Am I missing something? > > Also, wouldn't we need to subtract something of the size? Yeah, I felt it confusing. The last line above is offset from just after the header part (it is PgStat_, not PgStatShared_). I first wrote that as you suggested but rewrote to shorter representation. > > > diff --git a/src/backend/utils/activity/pgstat_shmem.c > > b/src/backend/utils/activity/pgstat_shmem.c > > index ac98918688..09a8c3873c 100644 > > --- a/src/backend/utils/activity/pgstat_shmem.c > > +++ b/src/backend/utils/activity/pgstat_shmem.c > > @@ -915,8 +915,9 @@ shared_stat_reset_contents(PgStat_Kind kind, > > PgStatShared_Common *header, > > { > > const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind); > > > > - memset(pgstat_get_entry_data(kind, header), 0, > > - pgstat_get_entry_len(kind)); > > + memset((char *)pgstat_get_entry_data(kind, header) + > > + kind_info->reset_off, 0, > > + kind_info->reset_len); > > > > if (kind_info->reset_timestamp_cb) > > kind_info->reset_timestamp_cb(header, ts); > > This likely doesn't quite conform to what pgindent wants... In the first place, it's ugly... regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 5789b35c1eca77ad63066e6671921e1e8a656372 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Tue, 27 Sep 2022 14:28:56 +0900 Subject: [PATCH v1] Do not reset slot name of replication slot stats pg_stat_reset_replication_slot() clears counters involving slot name, which is not intended and causes crash. Move slot name out of PgStat_StatReplSlotEntry to PgStatShared_Database so that the name won't be wiped out by a counter reset. --- src/backend/utils/activity/pgstat_replslot.c | 12 ++-- src/include/pgstat.h | 1 - src/include/utils/pgstat_internal.h | 1 + 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c index 2039ac3147..0a383f9f32 100644 --- a/src/backend/utils/activity/pgstat_replslot.c +++ b/src/backend/utils/activity/pgstat_replslot.c @@ -86,7 +86,7 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re * Any mismatch should have been fixed in pgstat_create_replslot() or * pgstat_acquire_replslot(). */ - Assert(namestrcmp(&statent->slotname, NameStr(slot->data.name)) == 0); + Assert(namestrcmp(&shstatent->slotname, NameStr(slot->data.name)) == 0); /* Update the replication slot statistics */ #define REPLSLOT_ACC(fld) statent->fld += repSlotStat->fld @@ -124,7 +124,7 @@ pgstat_create_replslot(ReplicationSlot *slot) * if we previously crashed after dropping a slot. */ memset(&shstatent->stats, 0, sizeof(shstatent->stats)); - namestrcpy(&shstatent->stats.slotname, NameStr(slot->data.name)); + namestrcpy(&shstatent->slotname, NameStr(slot->data.name)); pgstat_unlock_entry(entry_ref); } @@ -148,11 +148,11 @@ pgstat_acquire_replslot(ReplicationSlot *slot) * NB: need to accept that there might be stats from an older slot, e.g. * if we previously crashed after dropping a slot. */ - if (NameStr(statent->slotname)[0] == 0 || - namestrcmp(&statent->slotname, NameStr(slot->data.name)) != 0) + if (NameStr(shstatent->slotname)[0] == 0 || + namestrcmp(&shstatent->slotname, NameStr(slot->data.name)) != 0) { memset(statent, 0, sizeof(*statent)); - namestrcpy(&statent->slotname, NameStr(slot->data.name)); + namestrcpy(&shstatent->slotname, NameStr(slot->data.name)); } pgstat_unlock_entry(entry_ref); @@ -187,7 +187,7 @@ pgstat_fetch_replslot(NameData slotname) void pgstat_replslot_to_serialized_name_cb(const PgStatShared_Common *header, NameData *name) { - namestrcpy(name,
Re: START_REPLICATION SLOT causing a crash in an assert build
Hi, I wonder if the correct fix here wouldn't be to move the slotname out of PgStat_StatReplSlotEntry? On 2022-09-16 14:37:17 +0900, Kyotaro Horiguchi wrote: > diff --git a/src/backend/utils/activity/pgstat.c > b/src/backend/utils/activity/pgstat.c > index 6224c498c2..ed3f3af4d9 100644 > --- a/src/backend/utils/activity/pgstat.c > +++ b/src/backend/utils/activity/pgstat.c > @@ -263,6 +263,8 @@ static const PgStat_KindInfo > pgstat_kind_infos[PGSTAT_NUM_KINDS] = { > .shared_size = sizeof(PgStatShared_Database), > .shared_data_off = offsetof(PgStatShared_Database, stats), > .shared_data_len = sizeof(((PgStatShared_Database *) 0)->stats), > + .reset_off = 0, > + .reset_len = sizeof(((PgStatShared_Database *) 0)->stats), > .pending_size = sizeof(PgStat_StatDBEntry), > > .flush_pending_cb = pgstat_database_flush_cb, > @@ -277,6 +279,8 @@ static const PgStat_KindInfo > pgstat_kind_infos[PGSTAT_NUM_KINDS] = { > .shared_size = sizeof(PgStatShared_Relation), > .shared_data_off = offsetof(PgStatShared_Relation, stats), > .shared_data_len = sizeof(((PgStatShared_Relation *) 0)->stats), > + .reset_off = 0, > + .reset_len = sizeof(((PgStatShared_Relation *) 0)->stats), > .pending_size = sizeof(PgStat_TableStatus), > > .flush_pending_cb = pgstat_relation_flush_cb, > @@ -291,6 +295,8 @@ static const PgStat_KindInfo > pgstat_kind_infos[PGSTAT_NUM_KINDS] = { > .shared_size = sizeof(PgStatShared_Function), > .shared_data_off = offsetof(PgStatShared_Function, stats), > .shared_data_len = sizeof(((PgStatShared_Function *) 0)->stats), > + .reset_off = 0, > + .reset_len = sizeof(((PgStatShared_Function *) 0)->stats), > .pending_size = sizeof(PgStat_BackendFunctionEntry), > > .flush_pending_cb = pgstat_function_flush_cb, > @@ -307,6 +313,10 @@ static const PgStat_KindInfo > pgstat_kind_infos[PGSTAT_NUM_KINDS] = { > .shared_size = sizeof(PgStatShared_ReplSlot), > .shared_data_off = offsetof(PgStatShared_ReplSlot, stats), > .shared_data_len = sizeof(((PgStatShared_ReplSlot *) 0)->stats), > + /* reset doesn't wipe off slot name */ > + .reset_off = offsetof(PgStat_StatReplSlotEntry, spill_txns), > + .reset_len = sizeof(((PgStatShared_ReplSlot *) 0)->stats), > + offsetof(PgStat_StatReplSlotEntry, spill_txns), I'm confused what this offsetof does here? It's not even assigned to a specific field? Am I missing something? Also, wouldn't we need to subtract something of the size? > diff --git a/src/backend/utils/activity/pgstat_shmem.c > b/src/backend/utils/activity/pgstat_shmem.c > index ac98918688..09a8c3873c 100644 > --- a/src/backend/utils/activity/pgstat_shmem.c > +++ b/src/backend/utils/activity/pgstat_shmem.c > @@ -915,8 +915,9 @@ shared_stat_reset_contents(PgStat_Kind kind, > PgStatShared_Common *header, > { > const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind); > > - memset(pgstat_get_entry_data(kind, header), 0, > -pgstat_get_entry_len(kind)); > + memset((char *)pgstat_get_entry_data(kind, header) + > +kind_info->reset_off, 0, > +kind_info->reset_len); > > if (kind_info->reset_timestamp_cb) > kind_info->reset_timestamp_cb(header, ts); This likely doesn't quite conform to what pgindent wants... Greetings, Andres Freund
Re: START_REPLICATION SLOT causing a crash in an assert build
At Mon, 19 Sep 2022 11:04:03 -0500, Jaime Casanova wrote in > On Fri, Sep 16, 2022 at 02:37:17PM +0900, Kyotaro Horiguchi wrote: > > At Thu, 15 Sep 2022 11:15:12 -0500, Jaime Casanova > > wrote in > > > It fails at ./src/backend/utils/activity/pgstat_shmem.c:530 inside > > > > Thanks for the info. I reproduced by make check.. stupid.. > > > > It's the thinko about the base address of reset_off. > > > > So the attached doesn't crash.. > > > > Hi, > > Just confirming there have been no crash since this last patch. Thanks for confirmation. Althouh I'm not sure whether this is the right direction, this seems to be an open item of 15? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: START_REPLICATION SLOT causing a crash in an assert build
On Fri, Sep 16, 2022 at 02:37:17PM +0900, Kyotaro Horiguchi wrote: > At Thu, 15 Sep 2022 11:15:12 -0500, Jaime Casanova > wrote in > > It fails at ./src/backend/utils/activity/pgstat_shmem.c:530 inside > > Thanks for the info. I reproduced by make check.. stupid.. > > It's the thinko about the base address of reset_off. > > So the attached doesn't crash.. > Hi, Just confirming there have been no crash since this last patch. -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
Re: START_REPLICATION SLOT causing a crash in an assert build
At Thu, 15 Sep 2022 11:15:12 -0500, Jaime Casanova wrote in > It fails at ./src/backend/utils/activity/pgstat_shmem.c:530 inside Thanks for the info. I reproduced by make check.. stupid.. It's the thinko about the base address of reset_off. So the attached doesn't crash.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c index 6224c498c2..ed3f3af4d9 100644 --- a/src/backend/utils/activity/pgstat.c +++ b/src/backend/utils/activity/pgstat.c @@ -263,6 +263,8 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = { .shared_size = sizeof(PgStatShared_Database), .shared_data_off = offsetof(PgStatShared_Database, stats), .shared_data_len = sizeof(((PgStatShared_Database *) 0)->stats), + .reset_off = 0, + .reset_len = sizeof(((PgStatShared_Database *) 0)->stats), .pending_size = sizeof(PgStat_StatDBEntry), .flush_pending_cb = pgstat_database_flush_cb, @@ -277,6 +279,8 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = { .shared_size = sizeof(PgStatShared_Relation), .shared_data_off = offsetof(PgStatShared_Relation, stats), .shared_data_len = sizeof(((PgStatShared_Relation *) 0)->stats), + .reset_off = 0, + .reset_len = sizeof(((PgStatShared_Relation *) 0)->stats), .pending_size = sizeof(PgStat_TableStatus), .flush_pending_cb = pgstat_relation_flush_cb, @@ -291,6 +295,8 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = { .shared_size = sizeof(PgStatShared_Function), .shared_data_off = offsetof(PgStatShared_Function, stats), .shared_data_len = sizeof(((PgStatShared_Function *) 0)->stats), + .reset_off = 0, + .reset_len = sizeof(((PgStatShared_Function *) 0)->stats), .pending_size = sizeof(PgStat_BackendFunctionEntry), .flush_pending_cb = pgstat_function_flush_cb, @@ -307,6 +313,10 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = { .shared_size = sizeof(PgStatShared_ReplSlot), .shared_data_off = offsetof(PgStatShared_ReplSlot, stats), .shared_data_len = sizeof(((PgStatShared_ReplSlot *) 0)->stats), + /* reset doesn't wipe off slot name */ + .reset_off = offsetof(PgStat_StatReplSlotEntry, spill_txns), + .reset_len = sizeof(((PgStatShared_ReplSlot *) 0)->stats), + offsetof(PgStat_StatReplSlotEntry, spill_txns), .reset_timestamp_cb = pgstat_replslot_reset_timestamp_cb, .to_serialized_name = pgstat_replslot_to_serialized_name_cb, @@ -323,6 +333,8 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = { .shared_size = sizeof(PgStatShared_Subscription), .shared_data_off = offsetof(PgStatShared_Subscription, stats), .shared_data_len = sizeof(((PgStatShared_Subscription *) 0)->stats), + .reset_off = 0, + .reset_len = sizeof(((PgStatShared_Subscription *) 0)->stats), .pending_size = sizeof(PgStat_BackendSubEntry), .flush_pending_cb = pgstat_subscription_flush_cb, diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c index ac98918688..09a8c3873c 100644 --- a/src/backend/utils/activity/pgstat_shmem.c +++ b/src/backend/utils/activity/pgstat_shmem.c @@ -915,8 +915,9 @@ shared_stat_reset_contents(PgStat_Kind kind, PgStatShared_Common *header, { const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind); - memset(pgstat_get_entry_data(kind, header), 0, - pgstat_get_entry_len(kind)); + memset((char *)pgstat_get_entry_data(kind, header) + + kind_info->reset_off, 0, + kind_info->reset_len); if (kind_info->reset_timestamp_cb) kind_info->reset_timestamp_cb(header, ts); diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h index 901d2041d6..144fe5814c 100644 --- a/src/include/utils/pgstat_internal.h +++ b/src/include/utils/pgstat_internal.h @@ -215,6 +215,13 @@ typedef struct PgStat_KindInfo uint32 shared_data_off; uint32 shared_data_len; + /* +* The offset/size of the region to wipe out when the entry is reset. +* reset_off is relative to shared_data_off. +*/ + uint32 reset_off; + uint32 reset_len; + /* * The size of the pending data for this kind. E.g. how large * PgStat_EntryRef->pending is. Used for allocations.
Re: START_REPLICATION SLOT causing a crash in an assert build
On Thu, Sep 15, 2022 at 05:30:11PM +0900, Kyotaro Horiguchi wrote: > At Thu, 15 Sep 2022 01:26:15 -0500, Jaime Casanova > wrote in > > On Tue, Sep 13, 2022 at 10:07:50PM -0500, Jaime Casanova wrote: > > > On Tue, Sep 13, 2022 at 06:48:45PM +0900, Kyotaro Horiguchi wrote: > > > > > > > > Another measure would be to add the region to wipe-out on reset to > > > > PgStat_KindInfo, but it seems too much.. (attached) > > > > > > > > > > This patch solves the problem, i didn't like the other solution because > > > as you say it partly nullify the protection of the assertion. > > > > > > > I talked too fast, while it solves the immediate problem the patch as is > > causes other crashes. > > Where did the crash happen? Is it a bug introduced by it? Or does it > root to other cause? > Just compile and run the installcheck tests. It fails at ./src/backend/utils/activity/pgstat_shmem.c:530 inside pgstat_release_entry_ref() because it expects a "deadbeef", it seems to be a magic variable but cannot find what its use is. Attached a backtrace. -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 set = {__val = {4194304, 140727848465760, 2, 6, 6808139, 94494822015184, 4611686018427388799, 139771048516262, 0, 281470681751456, 0, 0, 0, 0, 0, 0}} pid = tid = ret = #1 0x7f1efb762535 in __GI_abort () at abort.c:79 save_stage = 1 act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = { 0, 0, 0, 0, 0, 139771046014965, 2, 3559591060477507152, 7018350264137834804, 94494822015184, 7003716880224747600, 0, 9726297134600432896, 140727848466000, 0, 140727848466864}}, sa_flags = 1246535888, sa_restorer = 0x0} sigs = {__val = {32, 0 }} #2 0x55f14ab8e280 in ExceptionalCondition ( conditionName=0x55f14ad70420 "entry_ref->shared_stats->magic == 0xdeadbeef", errorType=0x55f14ad70073 "FailedAssertion", fileName=0x55f14ad702da "pgstat_shmem.c", lineNumber=530) at assert.c:69 No locals. #3 0x55f14aa15a1e in pgstat_release_entry_ref (key=..., entry_ref=0x55f14bf71098, discard_pending=false) at pgstat_shmem.c:530 __func__ = "pgstat_release_entry_ref" #4 0x55f14aa15f68 in pgstat_release_matching_entry_refs (discard_pending=false, match=0x0, match_data=0) at pgstat_shmem.c:699 i = {cur = 255, end = 1, done = false} ent = 0x55f14bf450f0 #5 0x55f14aa15fc0 in pgstat_release_all_entry_refs (discard_pending=false) at pgstat_shmem.c:715 No locals. #6 0x55f14aa15203 in pgstat_detach_shmem () at pgstat_shmem.c:238 No locals. #7 0x55f14aa0de82 in pgstat_shutdown_hook (code=0, arg=0) at pgstat.c:520 No locals. #8 0x55f14a9b2b27 in shmem_exit (code=0) at ipc.c:239 __func__ = "shmem_exit" #9 0x55f14a9b29df in proc_exit_prepare (code=0) at ipc.c:194 __func__ = "proc_exit_prepare" #10 0x55f14a9b2930 in proc_exit (code=0) at ipc.c:107 __func__ = "proc_exit" #11 0x55f14a9ed1a0 in PostgresMain (dbname=0x55f14bed0e90 "regression", username=0x55f14bed0e68 "jcasanov") at postgres.c:4795 firstchar = 88 input_message = {data = 0x55f14bea4900 "", len = 0, maxlen = 1024, cursor = 0} local_sigjmp_buf = {{__jmpbuf = {0, -2769250407385151644, 94494822015184, 140727848468560, 0, 0, -2769250407297071260, -8248146431642187932}, __mask_was_saved = 1, __saved_mask = {__val = {4194304, 8534995794563506275, 15679, 15680, 979, 18446744073709551536, 0, 0, 139771044971635, 3904, 0, 140727848467536, 94494822015184, 140727848468560, 94494829323263, 15616 send_ready_for_query = false idle_in_transaction_timeout_enabled = false idle_session_timeout_enabled = false __func__ = "PostgresMain" #12 0x55f14a924a1d in BackendRun (port=0x55f14becad20) at postmaster.c:4504 No locals. #13 0x55f14a924369 in BackendStartup (port=0x55f14becad20) at postmaster.c:4232 bn = 0x55f14bec7e80 pid = 0 __func__ = "BackendStartup" #14 0x55f14a9207ae in ServerLoop () at postmaster.c:1806 port = 0x55f14becad20 i = 2 rmask = {fds_bits = {128, 0 }} selres = 1 now = 1663253503 readmask = {fds_bits = {224, 0 }} nSockets = 8 last_lockfile_recheck_time = 1663253479 last_touch_time = 1663253239 __func__ = "ServerLoop" #15 0x55f14a91fffd in PostmasterMain (argc=5, argv=0x55f14be9dec0) at postmaster.c:1478 opt = -1 status = 0 userDoption = 0x55f14bec1900 "data1" listen_addr_saved = true i = 64 output_config_variable = 0x0 __func__ = "PostmasterMain" #16 0x55f14a81f3fc in main (argc=5, argv=0x55f14be9de
Re: START_REPLICATION SLOT causing a crash in an assert build
At Thu, 15 Sep 2022 01:26:15 -0500, Jaime Casanova wrote in > On Tue, Sep 13, 2022 at 10:07:50PM -0500, Jaime Casanova wrote: > > On Tue, Sep 13, 2022 at 06:48:45PM +0900, Kyotaro Horiguchi wrote: > > > > > > Another measure would be to add the region to wipe-out on reset to > > > PgStat_KindInfo, but it seems too much.. (attached) > > > > > > > This patch solves the problem, i didn't like the other solution because > > as you say it partly nullify the protection of the assertion. > > > > I talked too fast, while it solves the immediate problem the patch as is > causes other crashes. Where did the crash happen? Is it a bug introduced by it? Or does it root to other cause? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: START_REPLICATION SLOT causing a crash in an assert build
On Tue, Sep 13, 2022 at 10:07:50PM -0500, Jaime Casanova wrote: > On Tue, Sep 13, 2022 at 06:48:45PM +0900, Kyotaro Horiguchi wrote: > > > > Another measure would be to add the region to wipe-out on reset to > > PgStat_KindInfo, but it seems too much.. (attached) > > > > This patch solves the problem, i didn't like the other solution because > as you say it partly nullify the protection of the assertion. > I talked too fast, while it solves the immediate problem the patch as is causes other crashes. -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
Re: START_REPLICATION SLOT causing a crash in an assert build
On Tue, Sep 13, 2022 at 06:48:45PM +0900, Kyotaro Horiguchi wrote: > Nice finding. > > At Tue, 13 Sep 2022 00:39:45 -0500, Jaime Casanova > wrote in > > and the problem seems to be that after zero'ing the stats that includes > > the name of the replication slot, this simple patch fixes it... not sure > > if it's the right fix though... > > That doesn't work. since what that function clears is not the name in > the slot struct but that in stats entry. > you're right... the curious thing is that I tested it and it worked, but now it doesn't... maybe it was too late for me... > The cause is what pg_stat_reset_replslot wants to do does not match > what pgstat feature thinks as reset. > [...] > > Another measure would be to add the region to wipe-out on reset to > PgStat_KindInfo, but it seems too much.. (attached) > This patch solves the problem, i didn't like the other solution because as you say it partly nullify the protection of the assertion. -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
Re: START_REPLICATION SLOT causing a crash in an assert build
Nice finding. At Tue, 13 Sep 2022 00:39:45 -0500, Jaime Casanova wrote in > and the problem seems to be that after zero'ing the stats that includes > the name of the replication slot, this simple patch fixes it... not sure > if it's the right fix though... That doesn't work. since what that function clears is not the name in the slot struct but that in stats entry. The cause is what pg_stat_reset_replslot wants to do does not match what pgstat feature thinks as reset. Unfortunately, we don't have a function to leave a portion alone after a reset. However, fetching the stats entry in pgstat_reset_replslot is ugly.. I'm not sure this is less uglier but it works if pgstat_report_replslot sets the name if it is found to be wiped out... But that hafly nullify the protction by the assertion just after. --- a/src/backend/utils/activity/pgstat_replslot.c +++ b/src/backend/utils/activity/pgstat_replslot.c @@ -83,9 +83,11 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re statent = &shstatent->stats; /* -* Any mismatch should have been fixed in pgstat_create_replslot() or -* pgstat_acquire_replslot(). +* pgstat_create_replslot() and pgstat_acquire_replslot() enters the name, +* but pgstat_reset_replslot() may clear it. */ + if (statent->slotname.data[0] == 0) + namestrcpy(&shstatent->stats.slotname, NameStr(slot->data.name)); Assert(namestrcmp(&statent->slotname, NameStr(slot->data.name)) == 0); Another measure would be to add the region to wipe-out on reset to PgStat_KindInfo, but it seems too much.. (attached) regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c index 6224c498c2..ab88ebbc64 100644 --- a/src/backend/utils/activity/pgstat.c +++ b/src/backend/utils/activity/pgstat.c @@ -263,6 +263,8 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = { .shared_size = sizeof(PgStatShared_Database), .shared_data_off = offsetof(PgStatShared_Database, stats), .shared_data_len = sizeof(((PgStatShared_Database *) 0)->stats), + .reset_off = offsetof(PgStatShared_Database, stats), + .reset_len = sizeof(((PgStatShared_Database *) 0)->stats), .pending_size = sizeof(PgStat_StatDBEntry), .flush_pending_cb = pgstat_database_flush_cb, @@ -277,6 +279,8 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = { .shared_size = sizeof(PgStatShared_Relation), .shared_data_off = offsetof(PgStatShared_Relation, stats), .shared_data_len = sizeof(((PgStatShared_Relation *) 0)->stats), + .reset_off = offsetof(PgStatShared_Relation, stats), + .reset_len = sizeof(((PgStatShared_Relation *) 0)->stats), .pending_size = sizeof(PgStat_TableStatus), .flush_pending_cb = pgstat_relation_flush_cb, @@ -291,6 +295,8 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = { .shared_size = sizeof(PgStatShared_Function), .shared_data_off = offsetof(PgStatShared_Function, stats), .shared_data_len = sizeof(((PgStatShared_Function *) 0)->stats), + .reset_off = offsetof(PgStatShared_Function, stats), + .reset_len = sizeof(((PgStatShared_Function *) 0)->stats), .pending_size = sizeof(PgStat_BackendFunctionEntry), .flush_pending_cb = pgstat_function_flush_cb, @@ -307,6 +313,9 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = { .shared_size = sizeof(PgStatShared_ReplSlot), .shared_data_off = offsetof(PgStatShared_ReplSlot, stats), .shared_data_len = sizeof(((PgStatShared_ReplSlot *) 0)->stats), + .reset_off = offsetof(PgStatShared_ReplSlot, stats.spill_txns), + .reset_len = sizeof(PgStatShared_ReplSlot) - + offsetof(PgStatShared_ReplSlot, stats.spill_txns), .reset_timestamp_cb = pgstat_replslot_reset_timestamp_cb, .to_serialized_name = pgstat_replslot_to_serialized_name_cb, @@ -323,6 +332,8 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = { .shared_size = sizeof(PgStatShared_Subscription), .shared_data_off = offsetof(PgStatShared_Subscription, stats), .shared_data_len = sizeof(((PgStatShared_Subscription *) 0)->stats), + .reset_off = offsetof(PgStatShared_Subscription, stats), + .reset_len = sizeof(((PgStatShared_Subscription *) 0)->stats), .pending_size = sizeof(PgStat_BackendSubEntry), .flush_pending_cb = pgstat_subscription_flush_cb, diff --git a/src/backend/utils/activity/pgstat_shmem
Re: START_REPLICATION SLOT causing a crash in an assert build
On Wed, Sep 07, 2022 at 12:39:08PM -0700, Andres Freund wrote: > Hi, > > On 2022-09-06 18:40:49 -0500, Jaime Casanova wrote: > > I'm not sure what is causing this, but I have seen this twice. The > > second time without activity after changing the set of tables in a > > PUBLICATION. This crash happens after a reset of statistics for a slot replication > Can you describe the steps to reproduce? > bin/pg_ctl -D data1 initdb bin/pg_ctl -D data1 -l logfile1 -o "-c port=54315 -c wal_level=logical" start bin/psql -p 54315 postgres < Which git commit does this happen on? > just tested again on f5047c1293acce3c6c3802b06825aa3a9f9aa55a > > > gdb says that debug_query_string contains: > > > > """ > > START_REPLICATION SLOT "sub_pgbench" LOGICAL 0/0 (proto_version '3', > > publication_names '"pub_pgbench"')START_REPLICATION SLOT "sub_pgbench" > > LOGICAL 0/0 (proto_version '3', publication_names '"pub_pgbench"') > > """ > > > > attached the backtrace. > > > > > #2 0x5559bfd4f0ed in ExceptionalCondition ( > > conditionName=0x5559bff30e20 "namestrcmp(&statent->slotname, > > NameStr(slot->data.name)) == 0", errorType=0x5559bff30e0d > > "FailedAssertion", fileName=0x5559bff30dbb "pgstat_replslot.c", > > lineNumber=89) at assert.c:69 > > what are statent->slotname and slot->data.name? > and the problem seems to be that after zero'ing the stats that includes the name of the replication slot, this simple patch fixes it... not sure if it's the right fix though... -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c index b77c05ab5f..db44d51b4c 100644 --- a/src/backend/utils/activity/pgstat_replslot.c +++ b/src/backend/utils/activity/pgstat_replslot.c @@ -65,6 +65,7 @@ pgstat_reset_replslot(const char *name) /* reset this one entry */ pgstat_reset(PGSTAT_KIND_REPLSLOT, InvalidOid, ReplicationSlotIndex(slot)); + namestrcpy(&slot->data.name, name); } /*
Re: START_REPLICATION SLOT causing a crash in an assert build
On Wed, Sep 07, 2022 at 12:39:08PM -0700, Andres Freund wrote: > Hi, > > On 2022-09-06 18:40:49 -0500, Jaime Casanova wrote: > > I'm not sure what is causing this, but I have seen this twice. The > > second time without activity after changing the set of tables in a > > PUBLICATION. > > Can you describe the steps to reproduce? > I'm still trying to determine that > Which git commit does this happen on? > 6e55ea79faa56db85a2b6c5bf94cee8acf8bfdb8 (Stamp 15beta4) > > > gdb says that debug_query_string contains: > > > > """ > > START_REPLICATION SLOT "sub_pgbench" LOGICAL 0/0 (proto_version '3', > > publication_names '"pub_pgbench"')START_REPLICATION SLOT "sub_pgbench" > > LOGICAL 0/0 (proto_version '3', publication_names '"pub_pgbench"') > > """ > > > > attached the backtrace. > > > > > #2 0x5559bfd4f0ed in ExceptionalCondition ( > > conditionName=0x5559bff30e20 "namestrcmp(&statent->slotname, > > NameStr(slot->data.name)) == 0", errorType=0x5559bff30e0d > > "FailedAssertion", fileName=0x5559bff30dbb "pgstat_replslot.c", > > lineNumber=89) at assert.c:69 > > what are statent->slotname and slot->data.name? > slot->data.name seems to be the replication_slot record, and statent->slotname comes from the in shared memory stats for that slot. And the assert happens when &statent->slotname.data comes empty, which is not frequent but it happens from time to time btw, while I'm looking at this I found that we can drop a publication while there are active subscriptions pointing to it, is that something we should allow? anyway, that is not the cause of this because the replication slot actually exists. -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
Re: START_REPLICATION SLOT causing a crash in an assert build
Hi, On 2022-09-06 18:40:49 -0500, Jaime Casanova wrote: > I'm not sure what is causing this, but I have seen this twice. The > second time without activity after changing the set of tables in a > PUBLICATION. Can you describe the steps to reproduce? Which git commit does this happen on? > gdb says that debug_query_string contains: > > """ > START_REPLICATION SLOT "sub_pgbench" LOGICAL 0/0 (proto_version '3', > publication_names '"pub_pgbench"')START_REPLICATION SLOT "sub_pgbench" > LOGICAL 0/0 (proto_version '3', publication_names '"pub_pgbench"') > """ > > attached the backtrace. > > #2 0x5559bfd4f0ed in ExceptionalCondition ( > conditionName=0x5559bff30e20 "namestrcmp(&statent->slotname, > NameStr(slot->data.name)) == 0", errorType=0x5559bff30e0d "FailedAssertion", > fileName=0x5559bff30dbb "pgstat_replslot.c", > lineNumber=89) at assert.c:69 what are statent->slotname and slot->data.name? Greetings, Andres Freund
START_REPLICATION SLOT causing a crash in an assert build
Hi, I'm not sure what is causing this, but I have seen this twice. The second time without activity after changing the set of tables in a PUBLICATION. gdb says that debug_query_string contains: """ START_REPLICATION SLOT "sub_pgbench" LOGICAL 0/0 (proto_version '3', publication_names '"pub_pgbench"')START_REPLICATION SLOT "sub_pgbench" LOGICAL 0/0 (proto_version '3', publication_names '"pub_pgbench"') """ attached the backtrace. -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 set = {__val = {4194304, 140731069195120, 2, 6, 6807736, 93843951759568, 4611686018427388799, 140276820691622, 0, 281470681751456, 0, 0, 0, 0, 0, 0}} pid = tid = ret = #1 0x7f94bdd51535 in __GI_abort () at abort.c:79 save_stage = 1 act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = { 0, 0, 0, 0, 0, 140276818190325, 2, 4122259537847870528, 7018350267711514210, 93843951759568, 7003715780713148896, 0, 1404880143296576, 140731069195360, 0, 140731069196224}}, sa_flags = -1083658032, sa_restorer = 0x0} sigs = {__val = {32, 0 }} #2 0x5559bfd4f0ed in ExceptionalCondition ( conditionName=0x5559bff30e20 "namestrcmp(&statent->slotname, NameStr(slot->data.name)) == 0", errorType=0x5559bff30e0d "FailedAssertion", fileName=0x5559bff30dbb "pgstat_replslot.c", lineNumber=89) at assert.c:69 No locals. #3 0x5559bfbd4353 in pgstat_report_replslot (slot=0x7f94bd839280, repSlotStat=0x7ffe81636820) at pgstat_replslot.c:89 entry_ref = 0x5559c0876130 shstatent = 0x7f94b48f2000 statent = 0x7f94b48f2018 #4 0x5559bfb09f76 in UpdateDecodingStats (ctx=0x5559c091ca50) at logical.c:1862 rb = 0x5559c093f5c0 repSlotStat = {slotname = { data = "\300\365\223\300YU\000\000X\206\227\300YU\000\000\250\031\225\300YU\000\000\225\257\261\277\000\000\000\000\240hc\201\376\177\000\000aT\261\277YU", '\000' }, spill_txns = 0, spill_count = 0, spill_bytes = 0, stream_txns = 0, stream_count = 0, stream_bytes = 0, total_txns = 1, total_bytes = 684, stat_reset_timestamp = 140731069196560} __func__ = "UpdateDecodingStats" #5 0x5559bfb03d2e in DecodeCommit (ctx=0x5559c091ca50, buf=0x7ffe81636b30, parsed=0x7ffe81636930, xid=74033, two_phase=false) at decode.c:706 origin_lsn = 0 commit_time = 715805149160794 origin_id = 0 i = 0 #6 0x5559bfb03055 in xact_decode (ctx=0x5559c091ca50, buf=0x7ffe81636b30) at decode.c:216 xlrec = 0x5559c095b608 parsed = {xact_time = 715805149160794, xinfo = 73, dbId = 65390, tsId = 1663, nsubxacts = 0, subxacts = 0x0, nrels = 0, xnodes = 0x0, nstats = 0, stats = 0x0, nmsgs = 5, msgs = 0x5559c095b620, twophase_xid = 0, twophase_gid = '\000' , nabortrels = 0, abortnodes = 0x0, nabortstats = 0, abortstats = 0x0, origin_lsn = 0, origin_timestamp = 0} xid = 74033 two_phase = false builder = 0x5559c0951640 reorder = 0x5559c093f5c0 r = 0x5559c091ce10 info = 0 '\000' __func__ = "xact_decode" #7 0x5559bfb02d78 in LogicalDecodingProcessRecord (ctx=0x5559c091ca50, record=0x5559c091ce10) at decode.c:119 buf = {origptr = 18637212136, endptr = 18637212272, record = 0x5559c091ce10} txid = 0 rmgr = {rm_name = 0x5559bfdf3c45 "Transaction", rm_redo = 0x5559bf792884 , rm_desc = 0x5559bf75aa12 , rm_identify = 0x5559bf75ab8f , rm_startup = 0x0, rm_cleanup = 0x0, rm_mask = 0x0, rm_decode = 0x5559bfb02edd } #8 0x5559bfb3aecd in XLogSendLogical () at walsender.c:3073 record = 0x5559c095b5d8 errm = 0x0 flushPtr = 18637212272 __func__ = "XLogSendLogical" #9 0x5559bfb3a1b4 in WalSndLoop (send_data=0x5559bfb3ae2b ) at walsender.c:2503 __func__ = "WalSndLoop" #10 0x5559bfb389fe in StartLogicalReplication (cmd=0x5559c08795d8) at walsender.c:1333 buf = {data = 0x0, len = 3, maxlen = 1024, cursor = 87} qc = {commandTag = 2170776608, nprocessed = 93843959044340} __func__ = "StartLogicalReplication" #11 0x5559bfb394b0 in exec_replication_command ( cmd_string=0x5559c08518a0 "START_REPLICATION SLOT \"sub_pgbench\" LOGICAL 0/0 (proto_version '3', publication_names '\"pub_pgbench\"')") at walsender.c:1843 cmd = 0x5559c08795d8 parse_rc = 0 cmd_node = 0x5559c08795d8 cmdtag = 0x5559bff17e7a "START_REPLICATION" cmd_context = 0x5559c0878620 old_context = 0x5559c0851780 __func__ = "exec_replication_command" #12 0x5559bfbadbbd in PostgresMain (dbname=0x5559c087df10 "pgbench", username=0x5559c087dee8 "jcasano