Hi,

On 12/4/23 6:10 AM, shveta malik wrote:
On Fri, Dec 1, 2023 at 5:40 PM Nisha Moond <nisha.moond...@gmail.com> wrote:

Review for v41 patch.

Thanks for the feedback.

~~~
2.
IIUC, the slotsyncworker's connection to the primary is to execute a
query. Its aim is not walsender type connection, but at primary when
queried, the 'backend_type' is set to 'walsender'.
Snippet from primary db-

datname  |   usename   | application_name | wait_event_type | backend_type
---------+-------------+------------------+-----------------+--------------
postgres | replication | slotsyncworker   | Client          | walsender

Is it okay?


Slot sync worker uses 'libpqrcv_connect' for connection which sends
'replication'-'database' key-value pair as one of the connection
options. And on the primary side, 'ProcessStartupPacket' on the basis
of this key-value pair sets the process as walsender one (am_walsender
= true).
And thus this reflects as backend_type='walsender' in
pg_stat_activity. I do not see any harm in this backend_type for
slot-sync worker currently. This is on a similar line of connections
used for logical-replications. And since a slot-sync worker also deals
with wals-positions (lsns), it is okay to maintain backend_type as
walsender unless you (or others) see any potential issue in doing
that. So let me know.

I don't see any issue as well (though I understand it might
seems weird to see a walsender process being spawned doing non
replication stuff)


~~~
4. primary_slot_name GUC value test:

When standby is started with a non-existing primary_slot_name, the
wal-receiver gives an error but the slot-sync worker does not raise
any error/warning. It is no-op though as it has a check 'if
(XLogRecPtrIsInvalid(WalRcv->latestWalEnd)) do nothing'.   Is this
okay or shall the slot-sync worker too raise an error and exit?

In another case, when standby is started with valid primary_slot_name,
but it is changed to some invalid value in runtime, then walreceiver
starts giving error but the slot-sync worker keeps on running. In this
case, unlike the previous case, it even did not go to no-op mode (as
it sees valid WalRcv->latestWalEnd from the earlier run) and keep
pinging primary repeatedly for slots.  Shall here it should error out
or at least be no-op until we give a valid primary_slot_name?



Nice catch, thanks!

I reviewed it. There is no way to test the existence/validity of
'primary_slot_name' on standby without making a connection to primary.
If primary_slot_name is invalid from the start, slot-sync worker will
be no-op (as you tested) as WalRecv->latestWalENd will be invalid, and
if 'primary_slot_name' is changed to invalid on runtime, slot-sync
worker will still keep on pinging primary. But that should be okay (in
fact needed) as it needs to sync at-least the previous slot's
positions (in case it is delayed in doing so for some reason earlier).
And once the slots are up-to-date on standby, even if worker pings
primary, it will not see any change in slots lsns and thus go for
longer nap. I think, it is not worth the effort to introduce the
complexity of checking validity of 'primary_slot_name' on primary from
standby for this rare scenario.


Maybe another option could be to have the walreceiver a way to let the slot sync
worker knows that it (the walreceiver) was not able to start due to non existing
replication slot on the primary? (that way we'd avoid the slot sync worker 
having
to talk to the primary).

Not sure about the extra effort to make it works though.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Reply via email to