Re: Synchronizing slots from primary to standby

shveta malik Mon, 07 Aug 2023 22:02:17 -0700

On Mon, Aug 7, 2023 at 3:17 PM Drouvot, Bertrand
<[email protected]> wrote:
>
> Hi,
>
> On 8/4/23 1:32 PM, shveta malik wrote:
> > On Fri, Aug 4, 2023 at 2:44 PM Drouvot, Bertrand
> > <[email protected]> wrote:
> >> On 7/28/23 4:39 PM, Bharath Rupireddy wrote:
>
> >> Sorry to be late, but I gave a second thought and I wonder if we really 
> >> need this design.
> >> (i.e start a logical replication background worker on the standby to sync 
> >> the slots).
> >>
> >> Wouldn't that be simpler to "just" update the sync slots "metadata"
> >> as the https://github.com/EnterpriseDB/pg_failover_slots module (mentioned 
> >> by Peter
> >> up-thread) is doing?
> >> (making use of LogicalConfirmReceivedLocation(), 
> >> LogicalIncreaseXminForSlot()
> >> and LogicalIncreaseRestartDecodingForSlot(), If I read 
> >> synchronize_one_slot() correctly).
> >>
> >
> > Agreed. It would be simpler to just update the metadata. I think you
> > have not got chance to review the latest posted patch ('v10-0003')
> > yet, it does the same.
>
> Thanks for the feedback! Yeah, I did not look at v10 in details and was
> looking at the email thread only.
>
> Indeed, I now see that 0003 does update the metadata in local_slot_advance(),
> that's great!
>
> >
> > But I do not quite get it as in how can we do it w/o starting a
> > background worker?
>
> Yeah, agree that we still need background workers.
> What I meant was to avoid to use "logical replication background worker"
> (aka through logicalrep_worker_launch()) to sync the slots.
>


Agreed. That is why in v10,v11 patches, we have different infra for
sync-slot worker i.e. it is not relying on "logical replication
background worker" anymore.

> > The question here is how many background workers we
> > need to have. Will one be sufficient or do we need one per db (as done
> > earlier by the original patches in this thread) or are we good with
> > dividing work among some limited number of workers?
> >
> > I feel syncing all slots in one worker may increase the lag between
> > subsequent syncs for a particular slot and if the number of slots are
> > huge, the chances of losing the slot-data is more in case of failure.
> > Starting one worker per db also might not be that efficient as it will
> > increase load on the system (both in terms of background worker and
> > network traffic) especially for a case where the number of dbs are
> > more. Thus starting max 'n' number of workers where 'n' is decided by
> > GUC and dividing the work/DBs among these looks a better option to me.
> > Please see the discussion in and around the email at [1]
> >
> > [1]: 
> > https://www.postgresql.org/message-id/CAJpy0uCT%2BnpL4eUvCWiV_MBEri9ixcUgJVDdsBCJSqLd0oD1fQ%40mail.gmail.com
>
> Thanks for the link! If I read the email thread correctly, this discussion
> was before V10 (which is the first version making use of 
> LogicalConfirmReceivedLocation(),
> LogicalIncreaseXminForSlot(), LogicalIncreaseRestartDecodingForSlot()) means
> before the metadata sync only has been implemented.
>
> While I agree that the approach to split the sync load among workers with the 
> new
> max_slot_sync_workers GUC seems reasonable without the sync only feature (pre 
> V10),
> I'm not sure that with the metadata sync only in place the extra complexity 
> to manage multiple
> sync workers is needed.
>
> Maybe we should start some tests/benchmark with only one sync worker to get 
> numbers
> and start from there?

Yes, we can do that performance testing to figure out the difference
between the two modes. I will try to get some statistics on this.

thanks
Shveta

Re: Synchronizing slots from primary to standby

Reply via email to