Hi. Here are my review comments for patch v43-0002.
====== Commit message 1. The nap time of worker is tuned according to the activity on the primary. The worker starts with nap time of 10ms and if no activity is observed on the primary for some time, then nap time is increased to 10sec. And if activity is observed again, nap time is reduced back to 10ms. ~ /nap time of worker/nap time of the worker/ /And if/If/ ~~~ 2. Slots synced on the standby can be identified using 'sync_state' column of pg_replication_slots view. The values are: 'n': none for user slots, 'i': sync initiated for the slot but waiting for the remote slot on the primary server to catch up. 'r': ready for periodic syncs. ~ /identified using/identified using the/ The meaning of "identified by" is unclear to me. It also seems to clash with later descriptions in system-views.sgml. Please see my later review comment about it (in the sgml file) ====== doc/src/sgml/bgworker.sgml 3. bgw_start_time is the server state during which postgres should start the process; it can be one of BgWorkerStart_PostmasterStart (start as soon as postgres itself has finished its own initialization; processes requesting this are not eligible for database connections), BgWorkerStart_ConsistentState (start as soon as a consistent state has been reached in a hot standby, allowing processes to connect to databases and run read-only queries), and BgWorkerStart_RecoveryFinished (start as soon as the system has entered normal read-write state. Note that the BgWorkerStart_ConsistentState and BgWorkerStart_RecoveryFinished are equivalent in a server that's not a hot standby), and BgWorkerStart_ConsistentState_HotStandby (same meaning as BgWorkerStart_ConsistentState but it is more strict in terms of the server i.e. start the worker only if it is hot-standby; if it is consistent state in non-standby, worker will not be started). Note that this setting only indicates when the processes are to be started; they do not stop when a different state is reached. ~ 3a. This seems to have grown to become just one enormous sentence that is too hard to read. IMO this should be changed to be a <variablelist> of possible values instead of a big slab of text. I suspect it could also be simplified quite a lot -- something like below SUGGESTION bgw_start_time is the server state during which postgres should start the process. Note that this setting only indicates when the processes are to be started; they do not stop when a different state is reached. Possible values are: - BgWorkerStart_PostmasterStart (start as soon as postgres itself has finished its own initialization; processes requesting this are not eligible for database connections) - BgWorkerStart_ConsistentState (start as soon as a consistent state has been reached in a hot-standby, allowing processes to connect to databases and run read-only queries) - BgWorkerStart_RecoveryFinished (start as soon as the system has entered normal read-write state. Note that the BgWorkerStart_ConsistentState and BgWorkerStart_RecoveryFinished are equivalent in a server that's not a hot standby) - BgWorkerStart_ConsistentState_HotStandby (same meaning as BgWorkerStart_ConsistentState but it is more strict in terms of the server i.e. start the worker only if it is hot-standby; if it is a consistent state in non-standby, the worker will not be started). ~~~ 3b. "i.e. start the worker only if it is hot-standby; if it is consistent state in non-standby, worker will not be started" ~ Why is it even necessary to say the 2nd part "if it is consistent state in non-standby, worker will not be started". It seems redundant given 1st part says the same, right? ====== doc/src/sgml/config.sgml 4. + <para> + The standbys corresponding to the physical replication slots in + <varname>standby_slot_names</varname> must enable + <varname>enable_syncslot</varname> for the standbys to receive + failover logical slots changes from the primary. + </para> 4a. Somehow "must enable enable_syncslot" seemed strange. Maybe re-word like: "must enable slot synchronization (see enable_syncslot)" OR "must configure enable_syncslot = true" ~~~ 4b. (seems like repetitive use of "the standbys") /for the standbys to/to/ OR /for the standbys to/so they can/ ~~~ 5. <varname>primary_conninfo</varname> string, or in a separate - <filename>~/.pgpass</filename> file on the standby server (use + <filename>~/.pgpass</filename> file on the standby server. (use This rearranged period seems unrelated to the current patch. Maybe don't touch this. ~~~ 6. + <para> + Specify <literal>dbname</literal> in + <varname>primary_conninfo</varname> string to allow synchronization + of slots from the primary server to the standby server. + This will only be used for slot synchronization. It is ignored + for streaming. </para> The wording "to allow synchronization of slots" seemed misleading to me. Isn't that more the purpose of the 'enable_syncslot' GUC? I think the intended wording is more like below: SUGGESTION If slot synchronization is enabled then it is also necessary to specify <literal>dbname</literal> in the <varname>primary_conninfo</varname> string. This will only be used for slot synchronization. It is ignored for streaming. ====== doc/src/sgml/logicaldecoding.sgml 7. + <para> + A logical replication slot on the primary can be synchronized to the hot + standby by enabling the failover option during slot creation and set + <varname>enable_syncslot</varname> on the standby. For the synchronization + to work, it is mandatory to have physical replication slot between the + primary and the standby. This physical replication slot for the standby + should be listed in <varname>standby_slot_names</varname> on the primary + to prevent the subscriber from consuming changes faster than the hot + standby. Additionally, similar to creating a logical replication slot + on the hot standby, <varname>hot_standby_feedback</varname> should be + set on the standby and a physical slot between the primary and the standby + should be used. + </para> 7a. /creation and set/creation and setting/ /to have physical replication/to have a physical replication/ ~ 7b. It's unclear why this is saying "should be listed in standby_slot_names" and "hot_standby_feedback should be set on the standby". Why is it saying "should" instead of MUST -- are these optional? I thought the GUC validation function mandates these (???). ~ 7c. Why does the paragraph say "and a physical slot between the primary and the standby should be used."; isn't that exactly what was already written earlier ("For the synchronization to work, it is mandatory to have physical replication slot between the primary and the standby" ~~~ 8. + <para> + By enabling synchronization of slots, logical replication can be resumed + after failover depending upon the + <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>sync_state</structfield> + for the synchronized slots on the standby at the time of failover. + The slots which were in ready sync_state ('r') on the standby before + failover can be used for logical replication after failover. However, + the slots which were in initiated sync_state ('i) and were not + sync-ready ('r') at the time of failover will be dropped and logical + replication for such slots can not be resumed after failover. This applies + to the case where a logical subscription is disabled before failover and is + enabled after failover. If the synchronized slot due to disabled + subscription could not be made sync-ready ('r') on standby, then the + subscription can not be resumed after failover even when enabled. 8a. This feels overcomplicated -- too much information? SUGGESTION depending upon the ... sync_state for the synchronized slots on the standby at the time of failover. Only slots that were in ready sync_state ('r') on the standby before failover can be used for logical replication after failover ~~~ 8b. + the slots which were in initiated sync_state ('i) and were not + sync-ready ('r') at the time of failover will be dropped and logical + replication for such slots can not be resumed after failover. This applies + to the case where a logical subscription is disabled before failover and is + enabled after failover. If the synchronized slot due to disabled + subscription could not be made sync-ready ('r') on standby, then the + subscription can not be resumed after failover even when enabled. But isn't ALL that part pretty much redundant information for the user? I thought these are not ready state, so they are not usable... End-Of-Story. Isn't everything else just more like implementation details, which the user does not need to know about? ~~~ 9. + If the primary is idle, making the synchronized slot on the standby + as sync-ready ('r') for enabled subscription may take noticeable time. + This can be sped up by calling the + <function>pg_log_standby_snapshot</function> function on the primary. + </para> SUGGESTION If the primary is idle, then the synchronized slots on the standby may take a noticeable time to reach the ready ('r') sync_state. This can be sped up by calling the <function>pg_log_standby_snapshot</function> function on the primary. ====== doc/src/sgml/system-views.sgml 10. + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>sync_state</structfield> <type>char</type> + </para> + <para> + Defines slot synchronization state. This is meaningful on the physical + standby which has enabled slots synchronization. + </para> I felt that this part "which has enabled slots synchronization" should cross-reference to the 'sync_enabled' GUC. ~~~ 11. + <para> + State code: + <literal>n</literal> = none for user created slots, + <literal>i</literal> = sync initiated for the slot but slot is not ready + yet for periodic syncs, + <literal>r</literal> = ready for periodic syncs. + </para> I'm wondering why don't we just reuse 'd' (disabled), 'p' (pending), 'e' (enabled) like the other tri-state attributes are using. ~~~ 12. + <para> + The hot standby can have any of these sync_state for the slots but on a + hot standby, the slots with state 'r' and 'i' can neither be used for logical + decoded nor dropped by the user. The primary server will have sync_state + as 'n' for all the slots. But if the standby is promoted to become the + new primary server, sync_state can be seen 'r' as well. On this new + primary server, slots with sync_state as 'r' and 'n' will behave the same. + </para></entry> + </row> 12a. /logical decoded/logical decoding/ ~ 12b. "sync_state as 'r' and 'n' will behave the same" sounds kind of hacky. Is there no alternative? Anyway, IMO mentioning about primary server states seems overkill, because you already said "This is meaningful on the physical standby" which I took as implying that it is *not* meaningful from the POV of the primary server. In light of this, I'm wondering if a better name for this attribute would be: 'standby_sync_state' ====== src/backend/access/transam/xlogrecovery.c 13. + /* + * Shutdown the slot sync workers to prevent potential conflicts between + * user processes and slotsync workers after a promotion. Additionally, + * drop any slots that have initiated but not yet completed the sync + * process. + */ + ShutDownSlotSync(); + slotsync_drop_initiated_slots(); + Is this where maybe the 'sync_state' should also be updated for everything so you are not left with confusion about different states on a node that is no longer a standby node? ====== src/backend/postmaster/postmaster.c 14. PostmasterMain ApplyLauncherRegister(); + SlotSyncWorkerRegister(); + Every other function call here is heavily commented but there is a conspicuous absence of a comment here. ~~~ 15. bgworker_should_start_now if (start_time == BgWorkerStart_ConsistentState) return true; + else if (start_time == BgWorkerStart_ConsistentState_HotStandby && + pmState != PM_RUN) + return true; /* fall through */ Change "else if" to "if" would be simpler. ====== .../libpqwalreceiver/libpqwalreceiver.c 16. + for (opt = opts; opt->keyword != NULL; ++opt) + { + /* + * If multiple dbnames are specified, then the last one will be + * returned + */ + if (strcmp(opt->keyword, "dbname") == 0 && opt->val && + opt->val[0] != '\0') + dbname = pstrdup(opt->val); + } This can use a tidier C99 style to declare 'opt' as the loop variable. ~~~ 17. static void libpqrcv_alter_slot(WalReceiverConn *conn, const char *slotname, - bool failover) + bool failover) What is this change for? Or, if something is wrong with the indent then anyway it should be fixed in patch 0001. ====== src/backend/replication/logical/logical.c 18. + /* + * Slots in state SYNCSLOT_STATE_INITIATED should have been dropped on + * promotion. + */ + if (!RecoveryInProgress() && slot->data.sync_state == SYNCSLOT_STATE_INITIATED) + elog(ERROR, "replication slot \"%s\" was not synced completely from the primary server", + NameStr(slot->data.name)); + + /* + * Do not allow consumption of a "synchronized" slot until the standby + * gets promoted. + */ + if (RecoveryInProgress() && slot->data.sync_state != SYNCSLOT_STATE_NONE) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot use replication slot \"%s\" for logical decoding", + NameStr(slot->data.name)), + errdetail("This slot is being synced from the primary server."), + errhint("Specify another replication slot."))); + 18a. Instead of having !RecoveryInProgress() and RecoveryInProgress() in separate conditions is the code simpler like: SUGGESTION if (RecoveryInProgress()) { /* Do not allow ... */ if (slot->data.sync_state != SYNCSLOT_STATE_NONE) ... } else { /* Slots in state... */ if (slot->data.sync_state == SYNCSLOT_STATE_INITIATED) ... } ~ 18b. Should the errdetail give the current state? ====== src/backend/replication/logical/slotsync.c 19. +/* + * Number of attempts for wait_for_primary_slot_catchup() after + * which it aborts the wait and the slot sync worker then moves + * to the next slot creation/sync. + */ +#define WORKER_PRIMARY_CATCHUP_WAIT_ATTEMPTS 5 Given this is only used within one static function, I'm wondering if it would be tidier to also move this macro to within that function. ~~~ 20. wait_for_primary_slot_catchup +/* + * Wait for remote slot to pass locally reserved position. + * + * Ping and wait for the primary server for + * WORKER_PRIMARY_CATCHUP_WAIT_ATTEMPTS during a slot creation, if it still + * does not catch up, abort the wait. The ones for which wait is aborted will + * attempt the wait and sync in the next sync-cycle. + * + * *persist will be set to false if the slot has disappeared or was invalidated + * on the primary; otherwise, it will be set to true. + */ 20a. The comment doesn't say the meaning of the boolean returned. ~ 20b. /*persist will be set/If passed, *persist will be set/ ~~~ 21. + appendStringInfo(&cmd, + "SELECT conflicting, restart_lsn, confirmed_flush_lsn," + " catalog_xmin FROM pg_catalog.pg_replication_slots" + " WHERE slot_name = %s", + quote_literal_cstr(remote_slot->name)); Somehow, I felt it is more readable if the " FROM" starts on a new line. e.g. "SELECT conflicting, restart_lsn, confirmed_flush_lsn, catalog_xmin" " FROM pg_catalog.pg_replication_slots" " WHERE slot_name = %s" ~~~ 22. + ereport(ERROR, + (errmsg("could not fetch slot info for slot \"%s\" from the" + " primary server: %s", + remote_slot->name, res->err))); Perhaps the message can be shortened like: "could not fetch slot \"%s\" info from the primary server: %s" ~~~ 23. + ereport(WARNING, + (errmsg("slot \"%s\" disappeared from the primary server," + " slot creation aborted", remote_slot->name))); Would this be better split into parts? SUGGESTION errmsg "slot \"%s\" creation aborted" errdetail "slot was not found on the primary server" ~~~ 24. + ereport(WARNING, + (errmsg("slot \"%s\" invalidated on the primary server," + " slot creation aborted", remote_slot->name))); (similar to previous) SUGGESTION errmsg "slot \"%s\" creation aborted" errdetail "slot was invalidated on the primary server" ~~~ 25. + /* + * Once we got valid restart_lsn, then confirmed_lsn and catalog_xmin + * are expected to be valid/non-null. + */ SUGGESTION Having got a valid restart_lsn, the confirmed_lsn and catalog_xmin are expected to be valid/non-null. ~~~ 26. slotsync_drop_initiated_slots +/* + * Drop the slots for which sync is initiated but not yet completed + * i.e. they are still waiting for the primary server to catch up. + */ I found "waiting for the primary server to catch up" to be difficult to understand without knowing the full details, but it is not really described properly until a much larger comment that is buried in the synchronize_one_slot(). So I think all this needs explanation up-front in the file, which you can refer to. I have repeated this same review comment in a couple of places. ~~~ 27. get_local_synced_slot_names +static List * +get_local_synced_slot_names(void) +{ + List *localSyncedSlots = NIL; 27a. It's not returning a list of "names" though, so is this an appropriate function name? ~~~ 27b. Suggest just call that ('localSyncedSlots') differently. - In slotsync_drop_initiated_slots() function they are just called 'slots' - In drop_obsolete_slots() function it is called 'local_slot_list' IMO it is better if all these are consistently named -- just all lists 'slots' or all 'local_slots' or whatever. ~~~ 28. check_sync_slot_validity +static bool +check_sync_slot_validity(ReplicationSlot *local_slot, List *remote_slots, + bool *locally_invalidated) Somehow this wording "validity" seems like a misleading function name, because the return value has nothing to do with the slot field invalidated. The validity/locally_invalidated stuff is a secondary return as a side effect for the "true" case. A more accurate function name would be more like check_sync_slot_on_remote(). ~~~ 29. check_sync_slot_validity +static bool +check_sync_slot_validity(ReplicationSlot *local_slot, List *remote_slots, + bool *locally_invalidated) +{ + ListCell *cell; There is inconsistent naming -- ListCell lc; ListCell cell; ListCell lc_slot; etc.. IMO the more complicated names aren't of much value -- probably everything can be changed to 'lc' for consistency. ~~~ 30. drop_obsolete_slots + /* + * Get the list of local 'synced' slot so that those not on remote could + * be dropped. + */ /slot/slots/ Also, I don't think it is necessary to say "so that those not on remote could be dropped." -- That is already described in the function comment and again in a comment later in the loop. That seems enough. If the function name get_local_synced_slot_names() is improved a bit the comment seems redundant because it is obvious from the function name. ~~~ 31. + foreach(lc_slot, local_slot_list) + { + ReplicationSlot *local_slot = (ReplicationSlot *) lfirst(lc_slot); + bool local_exists = false; + bool locally_invalidated = false; + + local_exists = check_sync_slot_validity(local_slot, remote_slot_list, + &locally_invalidated); Shouldn't that 'local_exists' variable be called 'remote_exists'? That's what the other comments seem to be saying. ~~~ 32. construct_slot_query + appendStringInfo(s, + "SELECT slot_name, plugin, confirmed_flush_lsn," + " restart_lsn, catalog_xmin, two_phase, failover," + " database, pg_get_slot_invalidation_cause(slot_name)" + " FROM pg_catalog.pg_replication_slots" + " WHERE failover and sync_state != 'i'"); Just wondering if substituting the SYNCSLOT_STATE_INITIATED constant here might be more appropriate than hardwiring 'i'. Why have a constant but not use it? ~~~ 33. synchronize_one_slot +static void +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot, + bool *slot_updated) +{ + ReplicationSlot *s; + char sync_state = 0; 33a. It seems strange that the sync_state is initially assigned something other than the 3 legal values. Should this be defaulting to SYNCSLOT_STATE_NONE instead? ~ 33b. I think it is safer to default the *slot_updated = false; because the code appears to assume it was false already which may or may not be true. ~~~ 34. + /* + * Make sure that concerned WAL is received before syncing slot to target + * lsn received from the primary server. + * + * This check should never pass as on the primary server, we have waited + * for the standby's confirmation before updating the logical slot. + */ Maybe this comment should mention up-front that it is just a "Sanity check:" ~~~ 35. + /* + * With hot_standby_feedback enabled and invalidations handled + * apropriately as above, this should never happen. + */ + if (remote_slot->restart_lsn < MyReplicationSlot->data.restart_lsn) + { + ereport(ERROR, + errmsg("not synchronizing local slot \"%s\" LSN(%X/%X)" + " to remote slot's LSN(%X/%X) as synchronization " + " would move it backwards", remote_slot->name, + LSN_FORMAT_ARGS(MyReplicationSlot->data.restart_lsn), + LSN_FORMAT_ARGS(remote_slot->restart_lsn))); + + goto cleanup; + } 35a. IIUC then this another comment that should say it is just a "Sanity-check:". ~ 35b. I was wondering if there should be Assert(hot_standby_feedback) here also. The comment "With hot_standby_feedback enabled" is a bit vague whereas including an Assert will clarify that it must be set. ~ 35c. Since it says "this should never happen" then it appears elog is more appropriate than ereport because translations are not needed, right? ~ 35d. The ERROR will make that goto cleanup unreachable, won't it? ~~~ 36. + /* + * Already existing slot but not ready (i.e. waiting for the primary + * server to catch-up), lets attempt to make it sync-ready now. + */ /lets/let's/ ~~~ 37. + /* + * Refer the slot creation part (last 'else' block) for more details + * on this wait. + */ + if (remote_slot->restart_lsn < MyReplicationSlot->data.restart_lsn || + TransactionIdPrecedes(remote_slot->catalog_xmin, + MyReplicationSlot->data.catalog_xmin)) + { + if (!wait_for_primary_slot_catchup(wrconn, remote_slot, NULL)) + { + goto cleanup; + } + } 37a. Having to jump forward to understand earlier code seems backward. IMO there should be a big comment atop this module about this subject which the comment here can just refer to. I will write more about this topic later (below). ~ 37b. The extra code curly braces are not needed. ~~~ 38. + ereport(LOG, errmsg("newly locally created slot \"%s\" is sync-ready " + "now", remote_slot->name)); Better to put the whole errmsg() on a newline instead of splitting the string like that. ~~~ 39. + /* User created slot with the same name exists, raise ERROR. */ + else if (sync_state == SYNCSLOT_STATE_NONE) + { + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("skipping sync of slot \"%s\" as it is a user created" + " slot", remote_slot->name), + errdetail("This slot has failover enabled on the primary and" + " thus is sync candidate but user created slot with" + " the same name already exists on the standby"))); + } I felt it would be better to eliminate this case immediately up-front when you first searched for the slot names. e.g. code like below. IIUC this refactor also means the default sync_state can be assigned a normal value (as I suggested above) instead of the strange assignment to 0. + /* Search for the named slot */ + if ((s = SearchNamedReplicationSlot(remote_slot->name, true))) + { + SpinLockAcquire(&s->mutex); + sync_state = s->data.sync_state; + SpinLockRelease(&s->mutex); INSERT HERE + /* User-created slot with the same name exists, raise ERROR. */ + if (sync_state == SYNCSLOT_STATE_NONE) + ereport(ERROR, ... + } ~~~ 40. + /* Otherwise create the slot first. */ + else + { Insert a blank line above that comment for better readability (same as done for earlier 'else' in this same function) ~~~ 41. + ReplicationSlotCreate(remote_slot->name, true, RS_EPHEMERAL, + remote_slot->two_phase, + remote_slot->failover, + SYNCSLOT_STATE_INITIATED); + + slot = MyReplicationSlot; In hindsight, the prior if/else code blocks in this function also could have done "slot = MyReplicationSlot;" same as this -- then the code would be much less verbose. ~~~ 42. + SpinLockAcquire(&slot->mutex); + slot->data.database = get_database_oid(remote_slot->database, false); + + namestrcpy(&slot->data.plugin, remote_slot->plugin); + SpinLockRelease(&slot->mutex); IMO the code would be more readable *without* a blank line here because the mutexed block is more obvious. ~~~ 43. + /* + * If the local restart_lsn and/or local catalog_xmin is ahead of + * those on the remote then we cannot create the local slot in sync + * with the primary server because that would mean moving the local + * slot backwards and we might not have WALs retained for old LSN. In + * this case we will wait for the primary server's restart_lsn and + * catalog_xmin to catch up with the local one before attempting the + * sync. + */ 43a. This comment describes some fundamental concepts about how this logic works. I felt this and other comments like this should be at the top of this slotsync.c file. Then anything that needs to mention about it can refer to the top comment. For example, I also found other comments like "... they are still waiting for the primary server to catch up." to be difficult to understand without knowing these details, but I think describing core design stuff up-front and saying "refer to the comment atop the fil" probably would help a lot. ~ 43b. Should "wait for the primary server's restart_lsn and..." be "wait for the primary server slot's restart_lsn and..." ? ~~~ 44. + { + bool persist; + + if (!wait_for_primary_slot_catchup(wrconn, remote_slot, &persist)) + { + /* + * The remote slot didn't catch up to locally reserved + * position. + * + * We do not drop the slot because the restart_lsn can be + * ahead of the current location when recreating the slot in + * the next cycle. It may take more time to create such a + * slot. Therefore, we persist it (provided remote-slot is + * still valid) and attempt the wait and synchronization in + * the next cycle. + */ + if (persist) + { + ReplicationSlotPersist(); + *slot_updated = true; + } + + goto cleanup; + } + } Looking at the way this 'persist' parameter is used I felt is it too complicated. IIUC the wait_for_primary_slot_catchup can only return *persist = true (for a false return) when it has reached/exceeded the number of retries and still not yet caught up. Why should wait_for_primary_slot_catchup() pretend to know about persistence? In other words, I thought a more meaningful parameter/variable name (instead of 'persist') is something like 'wait_attempts_exceeded'. IMO that will make wait_for_primary_slot_catchup() code easier, and here you can just say like below, where the code matches the comment better. Thoughts? + if (wait_attempts_exceeded) + { + ReplicationSlotPersist(); + *slot_updated = true; + } ~~~ 45. + + + /* + * Wait for primary is either not needed or is over. Update the lsns + * and mark the slot as READY for further syncs. + */ Double blank lines? ~~~ 46. + ereport(LOG, errmsg("newly locally created slot \"%s\" is sync-ready " + "now", remote_slot->name)); + } + +cleanup: Better to put the whole errmsg() on a newline instead of splitting the string like that. ~~~ 47. synchronize_slots +/* + * Synchronize slots. + * + * Gets the failover logical slots info from the primary server and update + * the slots locally. Creates the slots if not present on the standby. + * + * Returns nap time for the next sync-cycle. + */ +static long +synchronize_slots(WalReceiverConn *wrconn) /update/updates/ ~~~ 48. + /* The primary_slot_name is not set yet or WALs not received yet */ + SpinLockAcquire(&WalRcv->mutex); + if (!WalRcv || + (WalRcv->slotname[0] == '\0') || + XLogRecPtrIsInvalid(WalRcv->latestWalEnd)) + { + SpinLockRelease(&WalRcv->mutex); + return naptime; + } + SpinLockRelease(&WalRcv->mutex); Just wondering if the scenario of "WALS not received" is a bit more like "no activity" so perhaps the naptime returned should be WORKER_INACTIVITY_NAPTIME_MS here? ~~~ 49. + /* Construct query to get slots info from the primary server */ + initStringInfo(&s); + construct_slot_query(&s); I did not like the construct_slot_query() to be separated from this function because it makes it too difficult to see if the slot_attr numbers and column types in this function are correct w.r.t. that query. IMO better when everything is in the same place where you can see it all together. e.g. Less risk of breaking something if changes are made. ~~~ 50. + /* Construct the remote_slot tuple and synchronize each slot locally */ + slot = MakeSingleTupleTableSlot(res->tupledesc, &TTSOpsMinimalTuple); Normally in all the other functions the variable 'slot' was the local ReplicationSlot but IIUC here represents a remote tuple. Making a different name would be better like 'remote_slottup' or something else. ~~~ 51. + /* + * If any of the slots get updated in this sync-cycle, retain default + * naptime and update 'last_update_time' in slot sync worker. But if no + * activity is observed in this sync-cycle, then increase naptime provided + * inactivity time reaches threshold. + */ I think "retain" is a slightly wrong word here because it might have been WORKER_INACTIVITY_NAPTIME_MS in the previous cycle. Maybe just /retain/use/ ~~~ 52. +/* + * Connects primary to validate the slot specified in primary_slot_name. + * + * Exits the worker if physical slot with the specified name does not exist. + */ +static void +validate_primary_slot(WalReceiverConn *wrconn) There is already a connection, so not sure if this connect should be saying "connects to"; Maybe is should be saying more like below: SUGGESTION Using the specified primary server connection, validate if the physical slot identified by GUC primary_slot_name exists. Exit the worker if the slot is not found. ~~~ 53. + initStringInfo(&cmd); + appendStringInfo(&cmd, + "select count(*) = 1 from pg_replication_slots where " + "slot_type='physical' and slot_name=%s", + quote_literal_cstr(PrimarySlotName)); Write the SQL keywords in uppercase. ~~~ 54. + if (res->status != WALRCV_OK_TUPLES) + ereport(ERROR, + (errmsg("could not fetch primary_slot_name info from the " + "primary: %s", res->err))); Shouldn't the name of the unfound slot be shown in the ereport, or will that already appear in the res->err? ~~~ 55. + ereport(ERROR, + errmsg("exiting slots synchronization as slot specified in " + "primary_slot_name is not valid")); + IMO the format should be the same as I suggested (later) for all the validate_slotsync_parameters() errors. Also, I think the name of the unfound slot needs to be in this message. So maybe result is like this: SUGGESTION ereport(ERROR, errmsg("exiting from slot synchronization due to bad configuration") /* translator: second %s is a GUC variable name */ errhint("The primary slot \"%s\" specified by %s is not valid.", slot_name, "primary_slot_name") ); ~~~ 56. +/* + * Checks if GUCs are set appropriately before starting slot sync worker + */ +static void +validate_slotsync_parameters(char **dbname) +{ + /* + * Since 'enable_syncslot' is ON, check that other GUC settings + * (primary_slot_name, hot_standby_feedback, wal_level, primary_conninfo) + * are compatible with slot synchronization. If not, raise ERROR. + */ + 56a. I thought that 2nd comment sort of belonged in the function comment. ~ 56b. It says "Since 'enable_syncslot' is ON", but I IIUC that is wrong because the other function slotsync_reread_config() might detect a change in this GUC and cause this validate_slotsync_parameters() to be called when enable_syncslot was changed to false. In other words, I think you also need to check 'enable_syncslot' and exit with appropriate ERROR same as all the other config problems. OTOH if this is not possible, then the slotsync_reread_config() might need fixing instead. ~~~ 57. + /* + * A physical replication slot(primary_slot_name) is required on the + * primary to ensure that the rows needed by the standby are not removed + * after restarting, so that the synchronized slot on the standby will not + * be invalidated. + */ + if (PrimarySlotName == NULL || strcmp(PrimarySlotName, "") == 0) + ereport(ERROR, + errmsg("exiting slots synchronization as primary_slot_name is " + "not set")); + + /* + * Hot_standby_feedback must be enabled to cooperate with the physical + * replication slot, which allows informing the primary about the xmin and + * catalog_xmin values on the standby. + */ + if (!hot_standby_feedback) + ereport(ERROR, + errmsg("exiting slots synchronization as hot_standby_feedback " + "is off")); + + /* + * Logical decoding requires wal_level >= logical and we currently only + * synchronize logical slots. + */ + if (wal_level < WAL_LEVEL_LOGICAL) + ereport(ERROR, + errmsg("exiting slots synchronisation as it requires " + "wal_level >= logical")); + + /* + * The primary_conninfo is required to make connection to primary for + * getting slots information. + */ + if (PrimaryConnInfo == NULL || strcmp(PrimaryConnInfo, "") == 0) + ereport(ERROR, + errmsg("exiting slots synchronization as primary_conninfo " + "is not set")); + + /* + * The slot sync worker needs a database connection for walrcv_exec to + * work. + */ + *dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo); + if (*dbname == NULL) + ereport(ERROR, + errmsg("exiting slots synchronization as dbname is not " + "specified in primary_conninfo")); + +} IMO all these errors can be improved by: - using a common format - including errhint for the reason - using the same tone for instructions on what to do (e.g saying must be set, rather than what was not set) SUGGESTION (something like this) ereport(ERROR, errmsg("exiting from slot synchronization due to bad configuration") /* translator: %s is a GUC variable name */ errhint("%s must be defined.", "primary_slot_name") ); ereport(ERROR, errmsg("exiting from slot synchronization due to bad configuration") /* translator: %s is a GUC variable name */ errhint("%s must be enabled.", "hot_standby_feedback") ); ereport(ERROR, errmsg("exiting from slot synchronization due to bad configuration") /* translator: wal_level is a GUC variable name, 'logical' is a value */ errhint("wal_level must be >= logical.") ); ereport(ERROR, errmsg("exiting from slot synchronization due to bad configuration") /* translator: %s is a GUC variable name */ errhint("%s must be defined.", "primary_conninfo") ); ereport(ERROR, errmsg("exiting from slot synchronization due to bad configuration") /* translator: 'dbname' is a specific option; %s is a GUC variable name */ errhint("'dbname' must be specified in %s.", "primary_conninfo") ); ~~~ 58. + *dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo); + if (*dbname == NULL) + ereport(ERROR, + errmsg("exiting slots synchronization as dbname is not specified in primary_conninfo")); + +} Unnecessary blank line at the end of the function ~~~ 59. +/* + * Re-read the config file. + * + * If any of the slot sync GUCs changed, validate the values again + * through validate_slotsync_parameters() which will exit the worker + * if validaity fails. + */ SUGGESTION If any of the slot sync GUCs have changed, re-validate them. The worker will exit if the check fails. ~~~ 60. + char *conninfo = pstrdup(PrimaryConnInfo); + char *slotname = pstrdup(PrimarySlotName); + bool syncslot = enable_syncslot; + bool standbyfeedback = hot_standby_feedback; For clarity, I would have used var names to match the old GUCs. e.g. /conninfo/old_primary_conninfo/ /slotname/old_primary_slot_name/ /syncslot/old_enable_syncslot/ /standbyfeedback/old_hot_standby_feedback/ ~~~ 61. + dbname = walrcv_get_dbname_from_conninfo(PrimaryConnInfo); + Assert(dbname); This code seems premature. IIUC this is only needed to detect that the dbname was changed. But I think the prerequisite is first that the conninfoChanged is true. So really this code should be guarded by if (conninfoChanged) so it can be done later in the function. ~~~ 62. + if (conninfoChanged || slotnameChanged || + (syncslot != enable_syncslot) || + (standbyfeedback != hot_standby_feedback)) + { + revalidate = true; + } SUGGESTION revalidate = conninfoChanged || slotnameChanged || (syncslot != enable_syncslot) || (standbyfeedback != hot_standby_feedback); ~~~ 63. + /* + * Since we have initialized this worker with old dbname, thus exit if + * dbname changed. Let it get restarted and connect to new dbname + * specified. + */ + if (conninfoChanged && strcmp(dbname, new_dbname) != 0) + { + ereport(ERROR, + errmsg("exiting slot sync woker as dbname in " + "primary_conninfo changed")); + } 63a. /old dbname/the old dbname/ /new dbname/the new dbname/ /woker/worker/ ~ 63b. This code feels awkward. Can't this dbname check and accompanying ERROR message be moved down into validate_slotsync_parameters(), so it lives along with all the other GUC validation logic? Maybe you'll need to change the validate_slotsync_parameters() parameters slightly but I think it is much better to keep all the validation together. ~~~ 64. + + +/* + * Interrupt handler for main loop of slot sync worker. + */ +static void +ProcessSlotSyncInterrupts(WalReceiverConn **wrconn) Double blank lines. ~~~ 65. + + + if (ConfigReloadPending) + slotsync_reread_config(); +} Double blank lines ~~~ 66. slotsync_worker_onexit +static void +slotsync_worker_onexit(int code, Datum arg) +{ + SpinLockAcquire(&SlotSyncWorker->mutex); + SlotSyncWorker->pid = 0; + SpinLockRelease(&SlotSyncWorker->mutex); +} Should assignment use InvalidPid (-1) instead of 0? ~~~ 67. ReplSlotSyncWorkerMain + SpinLockAcquire(&SlotSyncWorker->mutex); + + Assert(SlotSyncWorker->pid == 0); + + /* Advertise our PID so that the startup process can kill us on promotion */ + SlotSyncWorker->pid = MyProcPid; + + SpinLockRelease(&SlotSyncWorker->mutex); Shouldn't pid start as InvalidPid (-1) instead of Assert 0? ~~~ 68. + /* Connect to the primary server */ + wrconn = remote_connect(); + + /* + * Connect to primary and validate the slot specified in + * primary_slot_name. + */ + validate_primary_slot(wrconn); Maybe needs some slight rewording in the 2nd comment. "Connect to primary server" is already said and done in the 1st part. ~~~ 69. IsSlotSyncWorker +/* + * Is current process the slot sync worker? + */ +bool +IsSlotSyncWorker(void) +{ + return SlotSyncWorker->pid == MyProcPid; +} 69a. For consistency with others like it, I thought this be called IsLogicalSlotSyncWorker(). ~ 69b. For consistency with the others like this, I think the extern should be declared in logicalworker.h ~~~ 70. ShutDownSlotSync + SpinLockAcquire(&SlotSyncWorker->mutex); + if (!SlotSyncWorker->pid) + { + SpinLockRelease(&SlotSyncWorker->mutex); + return; + } IMO should be comparing with InvalidPid (-1) here; not 0. ~~~ 71. + SpinLockAcquire(&SlotSyncWorker->mutex); + + /* Is it gone? */ + if (!SlotSyncWorker->pid) + break; + + SpinLockRelease(&SlotSyncWorker->mutex); Ditto. bad pids should be InvalidPid (-1), not 0. ~~~ 72. SlotSyncWorkerShmemInit + if (!found) + { + memset(SlotSyncWorker, 0, size); + SpinLockInit(&SlotSyncWorker->mutex); + } Probably here the unassigned pid should be set to InvalidPid (-1), not 0. ~~~ 73. SlotSyncWorkerRegister + if (!enable_syncslot) + { + ereport(LOG, + errmsg("skipping slots synchronization as enable_syncslot is " + "disabled.")); + return; + } /as/because/ ====== src/backend/replication/logical/tablesync.c 74. #include "commands/copy.h" +#include "commands/subscriptioncmds.h" #include "miscadmin.h" There were only #include changes but no code changes. Is the #include needed? ====== src/backend/replication/slot.c 75. ReplicationSlotCreate void ReplicationSlotCreate(const char *name, bool db_specific, ReplicationSlotPersistency persistency, - bool two_phase, bool failover) + bool two_phase, bool failover, char sync_state) The function comment goes to trouble to describe all the parameters except for 'failover' and 'sync_slate'. I think a failover comment should be added in patch 0001 and then the sync_state comment should be added in patch 0002. ~~~ 76. + /* + * Do not allow users to drop the slots which are currently being synced + * from the primary to the standby. + */ + if (user_cmd && RecoveryInProgress() && + MyReplicationSlot->data.sync_state != SYNCSLOT_STATE_NONE) + { + ReplicationSlotRelease(); + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot drop replication slot \"%s\"", name), + errdetail("This slot is being synced from the primary."))); + } Should the errdetail give the current state? ====== src/backend/tcop/postgres.c 77. + else if (IsSlotSyncWorker()) + { + ereport(DEBUG1, + (errmsg_internal("replication slot sync worker is shutting down due to administrator command"))); + + /* + * Slot sync worker can be stopped at any time. + * Use exit status 1 so the background worker is restarted. + */ + proc_exit(1); + } Explicitly saying "ereport(DEBUG1, errmsg_internal(..." is a bit overkill; it is simpler to write this as "elog(DEBUG1, ....); ====== src/include/replication/slot.h 78. +/* The possible values for 'sync_state' in ReplicationSlotPersistentData */ +#define SYNCSLOT_STATE_NONE 'n' /* None for user created slots */ +#define SYNCSLOT_STATE_INITIATED 'i' /* Sync initiated for the slot but + * not completed yet, waiting for + * the primary server to catch-up */ +#define SYNCSLOT_STATE_READY 'r' /* Initialization complete, ready + * to be synced further */ Already questioned the same elsewhere. IIUC the same tri-state values of other attributes might be used here too without needing to introduce 3 new values. e.g. #define SYNCSLOT_STATE_DISABLED 'd' /* No syncing for this slot */ #define SYNCSLOT_STATE_PENDING 'p' /* Sync is enabled but we must wait for the primary server to catch up */ #define SYNCSLOT_STATE_ENABLED 'e' /* Sync is enabled and the slot is ready to be synced */ ~~~ 79. + /* + * Is this a slot created by a sync-slot worker? + * + * Relevant for logical slots on the physical standby. + */ + char sync_state; + I assumed that "Relevant for" means "Only relevant for". It should say that. If correct, IMO a better field name might be 'standby_sync_state' ====== src/test/recovery/t/050_verify_slot_order.pl 80. +$backup_name = 'backup2'; +$primary->backup($backup_name); + +# Create standby3 +my $standby3 = PostgreSQL::Test::Cluster->new('standby3'); +$standby3->init_from_backup( + $primary, $backup_name, + has_streaming => 1, + has_restoring => 1); The mixture of 'backup2' for 'standby3' seems confusing. Is there a reason to call it backup2? ~~~ 81. +# Verify slot properties on the standby +is( $standby3->safe_psql('postgres', + q{SELECT failover, sync_state FROM pg_replication_slots WHERE slot_name = 'lsub1_slot';} + ), + "t|r", + 'logical slot has sync_state as ready and failover as true on standby'); It might be better if the message has the same order as the SQL. Eg. "failover as true and sync_state as ready". ~~~ 82. +# Verify slot properties on the primary +is( $primary->safe_psql('postgres', + q{SELECT failover, sync_state FROM pg_replication_slots WHERE slot_name = 'lsub1_slot';} + ), + "t|n", + 'logical slot has sync_state as none and failover as true on primary'); + It might be better if the message has the same order as the SQL. Eg. "failover as true and sync_state as none". ~~~ 83. +# Test to confirm that restart_lsn of the logical slot on the primary is synced to the standby IMO the major test parts (like this one) may need more highlighting "# ---------------------" so those comments don't get lost among all the other comments. ~~~ 84. +# let the slots get synced on the standby +sleep 2; Won't this make the test prone to failure on slow machines? Is there not a more deterministic way to wait for the sync? ====== Kind Regards, Peter Smith. Fujitsu Australia