On Thu, Dec 4, 2025 at 7:59 PM shveta malik <[email protected]> wrote: > > On Fri, Dec 5, 2025 at 12:23 AM Masahiko Sawada <[email protected]> wrote: > > > > On Thu, Dec 4, 2025 at 1:30 AM shveta malik <[email protected]> wrote: > > > > > > On Thu, Dec 4, 2025 at 1:54 PM Masahiko Sawada <[email protected]> > > > wrote: > > > > > > > > > > > > I've attached the updated patch that incorporated the review comments > > > > and is rebased to the current HEAD. > > > > > > > > > > Thanks, a few things: > > > > > > > > > 1) > > > + The system automatically increases the > > > + effective WAL level to <literal>logical</literal> when > > > creating the first > > > + logical replication slot, and decreases it back to > > > <literal>replica</literal> > > > + when dropping the last logical replication slot. The current > > > effective WAL > > > + level can be monitored through <xref > > > linkend="guc-effective-wal-level"/> > > > + parameter. > > > > > > Shouldn't we mention about invalidation of slot along with dropping of > > > slot? I have suggested this earlier but I think it got missed to be > > > addressed. > > > > Fixed. Sorry I missed it. > > > > > > > > 2) > > > + else if (sync_replication_slots) > > > + { > > > + /* > > > + * Signal the postmaster to launch the slotsync worker. > > > + * > > > + * XXX: For simplicity, we keep the slotsync worker running > > > + * even after logical decoding is disabled. A future > > > + * improvement can consider starting and stopping the worker > > > + * based on logical decoding status change. > > > + */ > > > + kill(PostmasterPid, SIGUSR1); > > > + } > > > + } > > > + > > > + /* Update the status on shared memory */ > > > + UpdateLogicalDecodingStatus(logical_decoding, true); > > > > > > I see that we have moved 'Update' post slotsync's start attempt. This > > > leaves a possibility that that slot-sync is started sooner than last > > > 'Update' call and thus slotsync may exit with: > > > replication slot synchronization requires "effective_wal_level" >= > > > "logical" on the primary > > > > > > I see that update was prior to the slotsync step in earlier patches. > > > Why have we moved it to a later stage? > > > > You're right. I've reconsidered the operation order and reverted the > > previous change. On further thought, I think it should not happen to > > decode the STATUS_CHANGE record because when applying the > > STATUS_CHANGE record, we disable logical decoding first and then > > invalidate the slots. There is no window where a logical slot can > > creep in on the standby, and on the primary it cannot restart after > > changing wal_level < replica if there is a pre-existing slot. If my > > understanding is right, we should put a sanity check in xlog_decode(). > > > > > > > > 3) > > > + /* Return if someone already started to enable logical decoding */ > > > > > > Shall we update: > > > Return if someone already started to enable logical decoding, or if it > > > is already enabled. > > > > Fixed. > > > > I've attached the updated patches. The 0002 patch is a patch to > > address the above review comments. If it looks good, I'll merge these > > changes. > > > > Thanks, I will review. > > While testing one of the scenarios on v33 patch, I observed a strange > behaviour. I could not reproduce it again. I am not sure if it is due > to the patch. Please have a look once. > The scenario is: > > pub: create a slot to enable logical decoding, create table, insert few > records > standby: create slot and try to get-changes, get-changes hit some error: > > postgres=# SELECT pg_create_logical_replication_slot('slot13', > 'pgoutput', false, false, false); > pg_create_logical_replication_slot > ------------------------------------ > (slot13,0/0306D448) > (1 row) > > postgres=# select pg_logical_slot_get_binary_changes('slot13', NULL, > NULL, 'proto_version', '4', 'publication_names', 'pub'); > ERROR: cannot query non-catalog table "pg_class" during logical decoding >
Hmm, ISTM that the root cause is that we don't synchronously update the XLogLogicalInfo cache on the standby. A backend on the standby who started when logical decoding is disabled keeps holding XLogLogicalInfo = false until the promotion. So even if the startup process enables logical decoding when applying the STATUS_CHANGE record, logical decoding is enabled on the standby but the backend process would start logical decoding with XLogLogicalInfo = false, resulting in RelationIsAccessibleInLogicalDecoding() returns false. I missed the fact that we check XLogLogicalInfoActive() even read paths. Will fix it. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
