On Mon, Feb 19, 2024 at 9:46 AM shveta malik <shveta.ma...@gmail.com> wrote: > > Okay I see. Thanks for pointing it out. Here are the patches > addressing your comments. Changes are in patch001, rest are rebased. >
Few comments on 0001 ==================== 1. I think it is better to error out when the valid GUC or option is not set in ensure_valid_slotsync_params() and ensure_valid_remote_info() instead of waiting. And we shouldn't start the worker in the first place if not all required GUCs are set. This will additionally simplify the code a bit. 2. +typedef struct SlotSyncWorkerCtxStruct { - /* prevents concurrent slot syncs to avoid slot overwrites */ + pid_t pid; + bool stopSignaled; bool syncing; + time_t last_start_time; slock_t mutex; -} SlotSyncCtxStruct; +} SlotSyncWorkerCtxStruct; I think we don't need to change the name of this struct as this can be used both by the worker and the backend. We can probably add the comment to indicate that all the fields except 'syncing' are used by slotsync worker. 3. Similar to above, function names like SlotSyncShmemInit() shouldn't be changed to SlotSyncWorkerShmemInit(). 4. +ReplSlotSyncWorkerMain(int argc, char *argv[]) { ... + on_shmem_exit(slotsync_worker_onexit, (Datum) 0); ... + before_shmem_exit(slotsync_failure_callback, PointerGetDatum(wrconn)); ... } Do we need two separate callbacks? Can't we have just one (say slotsync_failure_callback) that cleans additional things in case of slotsync worker? And, if we need both those callbacks then please add some comments for both and why one is before_shmem_exit() and the other is on_shmem_exit(). In addition to the above, I have made a few changes in the comments and code (cosmetic code changes). Please include those in the next version if you find them okay. You need to rename .txt file to remove .txt and apply atop v90-0001*. -- With Regards, Amit Kapila.
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index 3080d4aa53..790799c164 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -463,9 +463,9 @@ static void InitPostmasterDeathWatchHandle(void); /* * Slot sync worker allowed to start up? * - * If we are on a hot standby, slot sync parameter is enabled, and it is - * the first time of worker's launch, or enough time has passed since the - * worker was launched last, then it is allowed to be started. + * We allow to start the slot sync worker when we are on a hot standby, + * sync_replication_slots is enabled, and it is the first time of worker's + * launch, or enough time has passed since the worker was launched last. */ #define SlotSyncWorkerAllowed() \ (sync_replication_slots && pmState == PM_HOT_STANDBY && \ diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c index 40ab87bce1..0e0f3cdf76 100644 --- a/src/backend/replication/logical/slotsync.c +++ b/src/backend/replication/logical/slotsync.c @@ -1052,9 +1052,9 @@ ensure_valid_slotsync_params(void) /* * Re-read the config file. * - * If any of the slot sync GUCs have changed, exit the worker and - * let it get restarted by the postmaster. The worker to be exited for - * restart purpose only if the caller passed restart as true. + * Exit if any of the slot sync GUCs have changed. The postmaster will + * restart it. The worker to be exited for restart purpose only if the + * caller passed restart as true. */ static void slotsync_reread_config(bool restart) @@ -1295,9 +1295,9 @@ ReplSlotSyncWorkerMain(int argc, char *argv[]) dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo); /* - * Connect to the database specified by user in primary_conninfo. We need - * a database connection for walrcv_exec to work. Please see comments atop - * libpqrcv_exec. + * Connect to the database specified by the user in primary_conninfo. We + * need a database connection for walrcv_exec to work which we use to fetch + * slot information from the remote node. See comments atop libpqrcv_exec. */ InitPostgres(dbname, InvalidOid, NULL, InvalidOid, 0, NULL); @@ -1310,12 +1310,11 @@ ReplSlotSyncWorkerMain(int argc, char *argv[]) appendStringInfo(&app_name, "%s", "slotsyncworker"); /* - * Establish the connection to the primary server for slots + * Establish the connection to the primary server for slot * synchronization. */ wrconn = walrcv_connect(PrimaryConnInfo, false, false, false, - app_name.data, - &err); + app_name.data, &err); pfree(app_name.data); if (!wrconn) @@ -1332,7 +1331,7 @@ ReplSlotSyncWorkerMain(int argc, char *argv[]) */ ensure_valid_remote_info(wrconn); - /* Main wait loop */ + /* Main loop to synchronize slots */ for (;;) { bool some_slot_updated = false; @@ -1345,7 +1344,7 @@ ReplSlotSyncWorkerMain(int argc, char *argv[]) } /* - * The slot sync worker can not get here because it will only stop when it + * The slot sync worker can't get here because it will only stop when it * receives a SIGINT from the startup process, or when there is an error. */ Assert(false);