On Mon, Aug 7, 2023 at 3:17 PM Drouvot, Bertrand <bertranddrouvot...@gmail.com> wrote: > > Hi, > > On 8/4/23 1:32 PM, shveta malik wrote: > > On Fri, Aug 4, 2023 at 2:44 PM Drouvot, Bertrand > > <bertranddrouvot...@gmail.com> wrote: > >> On 7/28/23 4:39 PM, Bharath Rupireddy wrote: > > >> Sorry to be late, but I gave a second thought and I wonder if we really > >> need this design. > >> (i.e start a logical replication background worker on the standby to sync > >> the slots). > >> > >> Wouldn't that be simpler to "just" update the sync slots "metadata" > >> as the https://github.com/EnterpriseDB/pg_failover_slots module (mentioned > >> by Peter > >> up-thread) is doing? > >> (making use of LogicalConfirmReceivedLocation(), > >> LogicalIncreaseXminForSlot() > >> and LogicalIncreaseRestartDecodingForSlot(), If I read > >> synchronize_one_slot() correctly). > >> > > > > Agreed. It would be simpler to just update the metadata. I think you > > have not got chance to review the latest posted patch ('v10-0003') > > yet, it does the same. > > Thanks for the feedback! Yeah, I did not look at v10 in details and was > looking at the email thread only. > > Indeed, I now see that 0003 does update the metadata in local_slot_advance(), > that's great! > > > > > But I do not quite get it as in how can we do it w/o starting a > > background worker? > > Yeah, agree that we still need background workers. > What I meant was to avoid to use "logical replication background worker" > (aka through logicalrep_worker_launch()) to sync the slots. >
Agreed. That is why in v10,v11 patches, we have different infra for sync-slot worker i.e. it is not relying on "logical replication background worker" anymore. > > The question here is how many background workers we > > need to have. Will one be sufficient or do we need one per db (as done > > earlier by the original patches in this thread) or are we good with > > dividing work among some limited number of workers? > > > > I feel syncing all slots in one worker may increase the lag between > > subsequent syncs for a particular slot and if the number of slots are > > huge, the chances of losing the slot-data is more in case of failure. > > Starting one worker per db also might not be that efficient as it will > > increase load on the system (both in terms of background worker and > > network traffic) especially for a case where the number of dbs are > > more. Thus starting max 'n' number of workers where 'n' is decided by > > GUC and dividing the work/DBs among these looks a better option to me. > > Please see the discussion in and around the email at [1] > > > > [1]: > > https://www.postgresql.org/message-id/CAJpy0uCT%2BnpL4eUvCWiV_MBEri9ixcUgJVDdsBCJSqLd0oD1fQ%40mail.gmail.com > > Thanks for the link! If I read the email thread correctly, this discussion > was before V10 (which is the first version making use of > LogicalConfirmReceivedLocation(), > LogicalIncreaseXminForSlot(), LogicalIncreaseRestartDecodingForSlot()) means > before the metadata sync only has been implemented. > > While I agree that the approach to split the sync load among workers with the > new > max_slot_sync_workers GUC seems reasonable without the sync only feature (pre > V10), > I'm not sure that with the metadata sync only in place the extra complexity > to manage multiple > sync workers is needed. > > Maybe we should start some tests/benchmark with only one sync worker to get > numbers > and start from there? Yes, we can do that performance testing to figure out the difference between the two modes. I will try to get some statistics on this. thanks Shveta