On Fri, 25 Jul 2025 at 11:45, Masahiko Sawada <sawada.m...@gmail.com> wrote:
>
> On Tue, Jul 22, 2025 at 11:44 PM shveta malik <shveta.ma...@gmail.com> wrote:
> >
> > On Tue, Jul 22, 2025 at 5:03 AM Masahiko Sawada <sawada.m...@gmail.com> 
> > wrote:
> > >
> > > Yes, I agree. The main patch focuses on the part where we
> > > automatically change the effective WAL level upon the logical slot
> > > creation and deletion (and potentially remove 'logical' from
> > > wal_level), and other things are implemented as additional features in
> > > a separate patch.
> >
> >  I am keeping my focus on patch001 until we decide further on how to
> > protect the slot.
>
> Yeah, I also dropped the additional feature patch from the patch set for now.
>
> > Apart from few comments in [1], please find one
> > more concern:
> >
> > There is a race condition between creating and dropping a replication
> > slot when enabling or disabling logical decoding. We might end up with
> > logical decoding disabled even when a logical slot is present.
> >
> > Steps:
> > 1) Set wal_level=replica on primary.
> > 2) Create logical_slot1 which will enable logical decoding, causing
> > effective_wal_level to become logical.
> > 3) Drop logical_slot1 and pause execution inside
> > DisableLogicalDecodingIfNecessary() right after the
> > 'n_inuse_logical_slots' check using a debugger.
> > 4) In another session, create logical_slot2. It will attempt to enable
> > logical-decoding but since it is already enabled,
> > EnsureLogicalDecodingEnabled() will be a no-op.
> > 5) Release debugger of drop-slot, it will disable logical decoding.
> >
> > Ultimately, logical_slot2is present while logical decoding is disabled
> > and thus we see this:
> >
> > postgres=# select slot_name from pg_replication_slots;
> >    slot_name
> > ---------------
> >  logical_slot2
> >
> > postgres=# show effective_wal_level;
> >  effective_wal_level
> > ---------------------
> >  replica
> > (1 row)
> >
> > postgres=# select pg_logical_slot_get_changes('logical_slot2', NULL,
> > NULL, 'proto_version', '4', 'publication_names', 'pub');
> > ERROR:  logical decoding is not enabled
> > HINT:  Set "wal_level" >= "logical" or create at least one logical slot.
> >
> > Shall we acquire LogicalDecodingControlLock in exclusive mode at a
> > little earlier stage? Currently we acquire it after
> > IsLogicalDecodingEnabled() check. I think we shall acquire it before
> > this check in in both enable and disable flow?
>
> Thank you for testing the patch!
>
> I've reworked the locking part in the patch. The attached v4 patch
> should address all review comments including your previous
> comments[1].

Few comments:
1) pg_waldump not handled for the new WAL record added
XLOG_LOGICAL_DECODING_STATUS_CHANGE:
+               XLogRegisterData(&logical_decoding, sizeof(bool));
+               recptr = XLogInsert(RM_XLOG_ID,
XLOG_LOGICAL_DECODING_STATUS_CHANGE);
+               XLogFlush(recptr);

rmgr: XLOG        len (rec/tot):     54/    54, tx:          0, lsn:
0/017633D8, prev 0/01763360, desc: PARAMETER_CHANGE
max_connections=100 max_worker_processes=8 max_wal_senders=10
max_prepared_xacts=10 max_locks_per_xact=64 wal_level=replica
wal_log_hints=off track_commit_timestamp=off
rmgr: XLOG        len (rec/tot):     27/    27, tx:          0, lsn:
0/01763410, prev 0/017633D8, desc: UNKNOWN (f0)
rmgr: Standby     len (rec/tot):     50/    50, tx:          0, lsn:
0/01763430, prev 0/01763410, desc: RUNNING_XACTS nextXid 754
latestCompletedXid 753 oldestRunningXid 754

2) Similarly pg_walinspect also should be handled for the new WAL record added:
postgres=# SELECT * FROM pg_get_wal_records_info('0/017633D8', '0/01763468');
 start_lsn  |  end_lsn   |  prev_lsn  | xid | resource_manager |
record_type    | record_length | main_data_length | fpi_length |
                    description
                                        | block_ref
------------+------------+------------+-----+------------------+------------------+---------------+------------------+------------+-----------------------------------------------------------
---------------------------------------------------------------------------------------------------------------+-----------
 0/017633D8 | 0/01763410 | 0/01763360 |   0 | XLOG             |
PARAMETER_CHANGE |            54 |               28 |          0 |
max_connections=100 max_worker_processes=8 max_wal_senders
=10 max_prepared_xacts=10 max_locks_per_xact=64 wal_level=replica
wal_log_hints=off track_commit_timestamp=off |
 0/01763410 | 0/01763430 | 0/017633D8 |   0 | XLOG             |
UNKNOWN (f0)     |            27 |                1 |          0 |

                                        |
 0/01763430 | 0/01763468 | 0/01763410 |   0 | Standby          |
RUNNING_XACTS    |            50 |               24 |          0 |
nextXid 754 latestCompletedXid 753 oldestRunningXid 754

                                        |
(3 rows)

3) Should this be the other way around? Would it be better to throw
the error earlier, instead of waiting for the running transactions to
finish?
@@ -136,6 +137,9 @@ create_logical_replication_slot(char *name, char *plugin,
                                                  temporary ?
RS_TEMPORARY : RS_EPHEMERAL, two_phase,
                                                  failover, false);

+       EnsureLogicalDecodingEnabled();
+       CheckLogicalDecodingRequirements();

4) The includes xlog_internal, xlogutils, atomics, lwlock, procsignal,
shmem, standby and guc is not required, I was able to compile without
it:
+ *       src/backend/replication/logical/logicalctl.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/xlog_internal.h"
+#include "access/xlogutils.h"
+#include "access/xloginsert.h"
+#include "catalog/pg_control.h"
+#include "port/atomics.h"
+#include "miscadmin.h"
+#include "storage/lwlock.h"
+#include "storage/procarray.h"
+#include "storage/procsignal.h"
+#include "storage/ipc.h"
+#include "storage/lmgr.h"
+#include "storage/shmem.h"
+#include "storage/standby.h"
+#include "replication/logicalctl.h"
+#include "replication/slot.h"
+#include "utils/guc.h"
+#include "utils/wait_event_types.h"

5) I felt this change is not related to this patch:
@@ -1144,7 +1152,7 @@ slotsync_reread_config(void)
        if (old_sync_replication_slots != sync_replication_slots)
        {
                ereport(LOG,
-               /* translator: %s is a GUC variable name */
+                               /* translator: %s is a GUC variable name */
                                errmsg("replication slot
synchronization worker will shut down because \"%s\" is disabled",
"sync_replication_slots"));
                proc_exit(0);

6) Can we include the high level design in the commit message and also
the other possible designs that were considered before finalizing on
this, it will help new reviewers to get a head start as the thread is
a long thread.

7) I did not see documentation added, can we add the required
documentation for this.

8) The new test file added should be included in meson.build file

Regards,
Vignesh


Reply via email to