Re: Synchronizing slots from primary to standby

Drouvot, Bertrand Mon, 07 Aug 2023 02:48:05 -0700

Hi,

On 8/4/23 1:32 PM, shveta malik wrote:

On Fri, Aug 4, 2023 at 2:44 PM Drouvot, Bertrand
<bertranddrouvot...@gmail.com> wrote:

On 7/28/23 4:39 PM, Bharath Rupireddy wrote:

Sorry to be late, but I gave a second thought and I wonder if we really need 
this design.
(i.e start a logical replication background worker on the standby to sync the 
slots).

Wouldn't that be simpler to "just" update the sync slots "metadata"
as the https://github.com/EnterpriseDB/pg_failover_slots module (mentioned by 
Peter
up-thread) is doing?
(making use of LogicalConfirmReceivedLocation(), LogicalIncreaseXminForSlot()
and LogicalIncreaseRestartDecodingForSlot(), If I read synchronize_one_slot() 
correctly).


Agreed. It would be simpler to just update the metadata. I think you
have not got chance to review the latest posted patch ('v10-0003')
yet, it does the same.


Thanks for the feedback! Yeah, I did not look at v10 in details and was
looking at the email thread only.

Indeed, I now see that 0003 does update the metadata in local_slot_advance(),
that's great!


But I do not quite get it as in how can we do it w/o starting a

background worker?


Yeah, agree that we still need background workers.
What I meant was to avoid to use "logical replication background worker"
(aka through logicalrep_worker_launch()) to sync the slots.

The question here is how many background workers we
need to have. Will one be sufficient or do we need one per db (as done
earlier by the original patches in this thread) or are we good with
dividing work among some limited number of workers?

I feel syncing all slots in one worker may increase the lag between
subsequent syncs for a particular slot and if the number of slots are
huge, the chances of losing the slot-data is more in case of failure.
Starting one worker per db also might not be that efficient as it will
increase load on the system (both in terms of background worker and
network traffic) especially for a case where the number of dbs are
more. Thus starting max 'n' number of workers where 'n' is decided by
GUC and dividing the work/DBs among these looks a better option to me.
Please see the discussion in and around the email at [1]

[1]: 
https://www.postgresql.org/message-id/CAJpy0uCT%2BnpL4eUvCWiV_MBEri9ixcUgJVDdsBCJSqLd0oD1fQ%40mail.gmail.com


Thanks for the link! If I read the email thread correctly, this discussion
was before V10 (which is the first version making use of 
LogicalConfirmReceivedLocation(),
LogicalIncreaseXminForSlot(), LogicalIncreaseRestartDecodingForSlot()) means
before the metadata sync only has been implemented.

While I agree that the approach to split the sync load among workers with the 
new
max_slot_sync_workers GUC seems reasonable without the sync only feature (pre 
V10),
I'm not sure that with the metadata sync only in place the extra complexity to 
manage multiple
sync workers is needed.

Maybe we should start some tests/benchmark with only one sync worker to get 
numbers
and start from there?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

Reply via email to