Re: Synchronizing slots from primary to standby

2024-01-11 Thread shveta malik
On Thu, Jan 11, 2024 at 7:28 AM Peter Smith  wrote:
>
> Here are some review comments for patch v58-0002

Thank You for the feedback. These are addressed in v60. Please find my
response inline for a few.

> (FYI - I quickly checked with the latest v59-0002 and AFAIK all these
> review comments below are still relevant)
>
> ==
> Commit message
>
> 1.
> If a logical slot is invalidated on the primary, slot on the standby is also
> invalidated.
>
> ~
>
> /slot on the standby/then that slot on the standby/
>
> ==
> doc/src/sgml/logicaldecoding.sgml
>
> 2.
> In order to resume logical replication after failover from the synced
> logical slots, it is required that 'conninfo' in subscriptions are
> altered to point to the new primary server using ALTER SUBSCRIPTION
> ... CONNECTION. It is recommended that subscriptions are first
> disabled before promoting the standby and are enabled back once these
> are altered as above after failover.
>
> ~
>
> Minor rewording mainly to reduce a long sentence.
>
> SUGGESTION
> To resume logical replication after failover from the synced logical
> slots, the subscription's 'conninfo' must be altered to point to the
> new primary server. This is done using ALTER SUBSCRIPTION ...
> CONNECTION. It is recommended that subscriptions are first disabled
> before promoting the standby and are enabled back after altering the
> connection string.
>
> ==
> doc/src/sgml/system-views.sgml
>
> 3.
> +  
> +   synced bool
> +  
> +  
> +  True if this logical slot was synced from a primary server.
> +  
> +  
>
> SUGGESTION
> True if this is a logical slot that was synced from a primary server.
>
> ==
> src/backend/access/transam/xlogrecovery.c
>
> 4.
> + /*
> + * Shutdown the slot sync workers to prevent potential conflicts between
> + * user processes and slotsync workers after a promotion.
> + *
> + * We do not update the 'synced' column from true to false here, as any
> + * failed update could leave some slot's 'synced' column as false. This
> + * could cause issues during slot sync after restarting the server as a
> + * standby. While updating after switching to the new timeline is an
> + * option, it does not simplify the handling for 'synced' column.
> + * Therefore, we retain the 'synced' column as true after promotion as they
> + * can provide useful information about their origin.
> + */
>
> Minor comment wording changes.
>
> BEFORE
> ...any failed update could leave some slot's 'synced' column as false.
> SUGGESTION
> ...any failed update could leave 'synced' column false for some slots.
>
> ~
>
> BEFORE
> Therefore, we retain the 'synced' column as true after promotion as
> they can provide useful information about their origin.
> SUGGESTION
> Therefore, we retain the 'synced' column as true after promotion as it
> may provide useful information about the slot origin.
>
> ==
> src/backend/replication/logical/slotsync.c
>
> 5.
> + * While creating the slot on physical standby, if the local restart_lsn 
> and/or
> + * local catalog_xmin is ahead of those on the remote then the worker cannot
> + * create the local slot in sync with the primary server because that would
> + * mean moving the local slot backwards and the standby might not have WALs
> + * retained for old LSN. In this case, the worker will mark the slot as
> + * RS_TEMPORARY. Once the primary server catches up, it will move the slot to
> + * RS_PERSISTENT and will perform the sync periodically.
>
> /will move the slot to RS_PERSISTENT/will mark the slot as RS_PERSISTENT/
>
> ~~~
>
> 6. drop_synced_slots_internal
> +/*
> + * Helper function for drop_obsolete_slots()
> + *
> + * Drops synced slot identified by the passed in name.
> + */
> +static void
> +drop_synced_slots_internal(const char *name, bool nowait)
> +{
> + Assert(MyReplicationSlot == NULL);
> +
> + ReplicationSlotAcquire(name, nowait);
> +
> + Assert(MyReplicationSlot->data.synced);
> +
> + ReplicationSlotDropAcquired();
> +}
>
> IMO you don't need this function. AFAICT it is only called from one
> place and does not result in fewer lines of code.
>
> ~~~
>
> 7. get_local_synced_slots
>
> + /* Check if it is logical synchronized slot */
> + if (s->in_use && SlotIsLogical(s) && s->data.synced)
> + {
> + local_slots = lappend(local_slots, s);
> + }
>
> Do you need to check SlotIsLogical(s) here? I thought s->data.synced
> can never be true for physical slots. I felt you could write this like
> blelow:
>
> if (s->in_use s->data.synced)
> {
>   Assert(SlotIsLogical(s));
>   local_slots = lappend(local_slots, s);
> }
>
> ~~~
>
> 8. check_sync_slot_on_remote
>
> +static bool
> +check_sync_slot_on_remote(ReplicationSlot *local_slot, List *remote_slots,
> +   bool *locally_invalidated)
> +{
> + ListCell   *lc;
> +
> + foreach(lc, remote_slots)
> + {
> + RemoteSlot *remote_slot = (RemoteSlot *) lfirst(lc);
>
> I think you can use the new style foreach_ptr list macros here.
>
> ~~~
>
> 9. drop_obsolete_slots
>
> +drop_obs

Re: Synchronizing slots from primary to standby

2024-01-12 Thread shveta malik
On Fri, Jan 12, 2024 at 5:30 PM Masahiko Sawada  wrote:
>
> On Thu, Jan 11, 2024 at 7:53 PM Amit Kapila  wrote:
> >
> > On Tue, Jan 9, 2024 at 6:39 PM Amit Kapila  wrote:
> > >
> > > +static bool
> > > +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot)
> > > {
> > > ...
> > > + /* Slot ready for sync, so sync it. */
> > > + else
> > > + {
> > > + /*
> > > + * Sanity check: With hot_standby_feedback enabled and
> > > + * invalidations handled appropriately as above, this should never
> > > + * happen.
> > > + */
> > > + if (remote_slot->restart_lsn < slot->data.restart_lsn)
> > > + elog(ERROR,
> > > + "cannot synchronize local slot \"%s\" LSN(%X/%X)"
> > > + " to remote slot's LSN(%X/%X) as synchronization"
> > > + " would move it backwards", remote_slot->name,
> > > + LSN_FORMAT_ARGS(slot->data.restart_lsn),
> > > + LSN_FORMAT_ARGS(remote_slot->restart_lsn));
> > > ...
> > > }
> > >
> > > I was thinking about the above code in the patch and as far as I can
> > > think this can only occur if the same name slot is re-created with
> > > prior restart_lsn after the existing slot is dropped. Normally, the
> > > newly created slot (with the same name) will have higher restart_lsn
> > > but one can mimic it by copying some older slot by using
> > > pg_copy_logical_replication_slot().
> > >
> > > I don't think as mentioned in comments even if hot_standby_feedback is
> > > temporarily set to off, the above shouldn't happen. It can only lead
> > > to invalidated slots on standby.
> > >
> > > To close the above race, I could think of the following ways:
> > > 1. Drop and re-create the slot.
> > > 2. Emit LOG/WARNING in this case and once remote_slot's LSN moves
> > > ahead of local_slot's LSN then we can update it; but as mentioned in
> > > your previous comment, we need to update all other fields as well. If
> > > we follow this then we probably need to have a check for catalog_xmin
> > > as well.
> > >
> >
> > The second point as mentioned is slightly misleading, so let me try to
> > rephrase it once again: Emit LOG/WARNING in this case and once
> > remote_slot's LSN moves ahead of local_slot's LSN then we can update
> > it; additionally, we need to update all other fields like two_phase as
> > well. If we follow this then we probably need to have a check for
> > catalog_xmin as well along remote_slot's restart_lsn.
> >
> > > Now, related to this the other case which needs some handling is what
> > > if the remote_slot's restart_lsn is greater than local_slot's
> > > restart_lsn but it is a re-created slot with the same name. In that
> > > case, I think the other properties like 'two_phase', 'plugin' could be
> > > different. So, is simply copying those sufficient or do we need to do
> > > something else as well?
> > >
> >
> > Bertrand, Dilip, Sawada-San, and others, please share your opinion on
> > this problem as I think it is important to handle this race condition.
>
> Is there any good use case of copying a failover slot in the first
> place? If it's not a normal use case and we can probably live without
> it, why not always disable failover during the copy? FYI we always
> disable two_phase on copied slots. It seems to me that copying a
> failover slot could lead to problems, as long as we synchronize slots
> based on their names. IIUC without the copy, this pass should never
> happen.
>
> Regards,
>
> --
> Masahiko Sawada
> Amazon Web Services: https://aws.amazon.com


There are multiple approaches discussed and tried when it comes to
starting a slot-sync worker. I am summarizing all here:

 1) Make slotsync worker as an Auxiliary Process (like checkpointer,
walwriter, walreceiver etc). The benefit this approach provides is, it
can control begin and stop in a more flexible way as each auxiliary
process could have different checks before starting and can have
different stop conditions. But it needs code duplication for process
management(start, stop, crash handling, signals etc) and currently it
does not support db-connection smoothly (none of the auxiliary process
has one so far)

We attempted to make slot-sync worker as an auxiliary process and
faced some challenges. The slot sync worker needs db-connection and
thus needs InitPostgres(). But AuxiliaryProcessMain() and
InitPostgres() are not compatible as both invoke common functions and
end up setting many callbacks functions twice (with different args).
Also InitPostgres() does 'MyBackendId' initialization (which further
triggers some stuff) which is not needed for AuxiliaryProcess and so
on. And thus in order to make slot-sync worker as an auxiliary
process, we need something similar to InitPostgres (trimmed down
working version) which needs further detailed analysis.

2) Make slotsync worker as a 'special' process like AutoVacLauncher
which is neither an Auxiliary process nor a bgworker one. It allows
db-connection and also provides flexibility to have start and stop
conditions for a process. But it needs a lot of code-duplication
ar

Re: Synchronizing slots from primary to standby

2024-01-15 Thread shveta malik
On Sat, Jan 13, 2024 at 12:54 PM Amit Kapila  wrote:
>
> On Fri, Jan 12, 2024 at 5:50 PM shveta malik  wrote:
> >
> > There are multiple approaches discussed and tried when it comes to
> > starting a slot-sync worker. I am summarizing all here:
> >
> >  1) Make slotsync worker as an Auxiliary Process (like checkpointer,
> > walwriter, walreceiver etc). The benefit this approach provides is, it
> > can control begin and stop in a more flexible way as each auxiliary
> > process could have different checks before starting and can have
> > different stop conditions. But it needs code duplication for process
> > management(start, stop, crash handling, signals etc) and currently it
> > does not support db-connection smoothly (none of the auxiliary process
> > has one so far)
> >
>
> As slotsync worker needs to perform transactions and access syscache,
> we can't make it an auxiliary process as that doesn't initialize the
> required stuff like syscache. Also, see the comment "Auxiliary
> processes don't run transactions ..." in AuxiliaryProcessMain() which
> means this is not an option.
>
> >
> > 2) Make slotsync worker as a 'special' process like AutoVacLauncher
> > which is neither an Auxiliary process nor a bgworker one. It allows
> > db-connection and also provides flexibility to have start and stop
> > conditions for a process.
> >
>
> Yeah, due to these reasons, I think this option is worth considering
> and another plus point is that this allows us to make enable_syncslot
> a PGC_SIGHUP GUC rather than a PGC_POSTMASTER.
>
> >
> > 3) Make slotysnc worker a bgworker. Here we just need to register our
> > process as a bgworker (RegisterBackgroundWorker()) by providing a
> > relevant start_time and restart_time and then the process management
> > is well taken care of. It does not need any code-duplication and
> > allows db-connection smoothly in registered process. The only thing it
> > lacks is that it does not provide flexibility of having
> > start-condition which then makes us to have 'enable_syncslot' as
> > PGC_POSTMASTER parameter rather than PGC_SIGHUP. Having said this, I
> > feel enable_syncslot is something which will not be changed frequently
> > and with the benefits provided by bgworker infra, it seems a
> > reasonably good option to choose this approach.
> >
>
> I agree but it may be better to make it a PGC_SIGHUP parameter.
>
> > 4) Another option is to have Logical Replication Launcher(or a new
> > process) to launch slot-sync worker. But going by the current design
> > where we have only 1 slotsync worker, it may be an overhead to have an
> > additional manager process maintained.
> >
>
> I don't see any good reason to have an additional launcher process here.
>
> >
> > Thus weighing pros and cons of all these options, we have currently
> > implemented the bgworker approach (approach 3).  Any feedback is
> > welcome.
> >
>
> I vote to go for (2) unless we face difficulties in doing so but (3)
> is also okay especially if others also think so.

I am not against any of the approaches but I still feel that when we
have a standard way of doing things (bgworker) we should not keep
adding code to do things in a special way unless there is a strong
reason to do so. Now we need to decide if 'enable_syncslot' being
PGC_POSTMASTER is a strong reason to go the non-standard way? If yes,
then we should think of option 2 else option 3 seems better in my
understanding (which may be limited due to my short experience here),
so I am all ears to what others think on this.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-01-16 Thread shveta malik
On Tue, Jan 16, 2024 at 12:59 PM Masahiko Sawada  wrote:
>
> On Tue, Jan 16, 2024 at 1:07 PM Amit Kapila  wrote:
> >
> > On Tue, Jan 16, 2024 at 9:03 AM shveta malik  wrote:
> > >
> > > On Sat, Jan 13, 2024 at 12:54 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Fri, Jan 12, 2024 at 5:50 PM shveta malik  
> > > > wrote:
> > > > >
> > > > > There are multiple approaches discussed and tried when it comes to
> > > > > starting a slot-sync worker. I am summarizing all here:
> > > > >
> > > > >  1) Make slotsync worker as an Auxiliary Process (like checkpointer,
> > > > > walwriter, walreceiver etc). The benefit this approach provides is, it
> > > > > can control begin and stop in a more flexible way as each auxiliary
> > > > > process could have different checks before starting and can have
> > > > > different stop conditions. But it needs code duplication for process
> > > > > management(start, stop, crash handling, signals etc) and currently it
> > > > > does not support db-connection smoothly (none of the auxiliary process
> > > > > has one so far)
> > > > >
> > > >
> > > > As slotsync worker needs to perform transactions and access syscache,
> > > > we can't make it an auxiliary process as that doesn't initialize the
> > > > required stuff like syscache. Also, see the comment "Auxiliary
> > > > processes don't run transactions ..." in AuxiliaryProcessMain() which
> > > > means this is not an option.
> > > >
> > > > >
> > > > > 2) Make slotsync worker as a 'special' process like AutoVacLauncher
> > > > > which is neither an Auxiliary process nor a bgworker one. It allows
> > > > > db-connection and also provides flexibility to have start and stop
> > > > > conditions for a process.
> > > > >
> > > >
> > > > Yeah, due to these reasons, I think this option is worth considering
> > > > and another plus point is that this allows us to make enable_syncslot
> > > > a PGC_SIGHUP GUC rather than a PGC_POSTMASTER.
> > > >
> > > > >
> > > > > 3) Make slotysnc worker a bgworker. Here we just need to register our
> > > > > process as a bgworker (RegisterBackgroundWorker()) by providing a
> > > > > relevant start_time and restart_time and then the process management
> > > > > is well taken care of. It does not need any code-duplication and
> > > > > allows db-connection smoothly in registered process. The only thing it
> > > > > lacks is that it does not provide flexibility of having
> > > > > start-condition which then makes us to have 'enable_syncslot' as
> > > > > PGC_POSTMASTER parameter rather than PGC_SIGHUP. Having said this, I
> > > > > feel enable_syncslot is something which will not be changed frequently
> > > > > and with the benefits provided by bgworker infra, it seems a
> > > > > reasonably good option to choose this approach.
> > > > >
> > > >
> > > > I agree but it may be better to make it a PGC_SIGHUP parameter.
> > > >
> > > > > 4) Another option is to have Logical Replication Launcher(or a new
> > > > > process) to launch slot-sync worker. But going by the current design
> > > > > where we have only 1 slotsync worker, it may be an overhead to have an
> > > > > additional manager process maintained.
> > > > >
> > > >
> > > > I don't see any good reason to have an additional launcher process here.
> > > >
> > > > >
> > > > > Thus weighing pros and cons of all these options, we have currently
> > > > > implemented the bgworker approach (approach 3).  Any feedback is
> > > > > welcome.
> > > > >
> > > >
> > > > I vote to go for (2) unless we face difficulties in doing so but (3)
> > > > is also okay especially if others also think so.
> > >
> > > I am not against any of the approaches but I still feel that when we
> > > have a standard way of doing things (bgworker) we should not keep
> > > adding code to do things in a special way unless there is a strong
> > > reason to do so. Now we need to decide if 'enable_syncslot' being
> > > PGC_P

Re: Synchronizing slots from primary to standby

2024-01-16 Thread shveta malik
On Sat, Jan 13, 2024 at 12:54 PM Amit Kapila  wrote:
>
> On Fri, Jan 12, 2024 at 5:50 PM shveta malik  wrote:
> >
> > There are multiple approaches discussed and tried when it comes to
> > starting a slot-sync worker. I am summarizing all here:
> >
> >  1) Make slotsync worker as an Auxiliary Process (like checkpointer,
> > walwriter, walreceiver etc). The benefit this approach provides is, it
> > can control begin and stop in a more flexible way as each auxiliary
> > process could have different checks before starting and can have
> > different stop conditions. But it needs code duplication for process
> > management(start, stop, crash handling, signals etc) and currently it
> > does not support db-connection smoothly (none of the auxiliary process
> > has one so far)
> >
>
> As slotsync worker needs to perform transactions and access syscache,
> we can't make it an auxiliary process as that doesn't initialize the
> required stuff like syscache. Also, see the comment "Auxiliary
> processes don't run transactions ..." in AuxiliaryProcessMain() which
> means this is not an option.
>
> >
> > 2) Make slotsync worker as a 'special' process like AutoVacLauncher
> > which is neither an Auxiliary process nor a bgworker one. It allows
> > db-connection and also provides flexibility to have start and stop
> > conditions for a process.
> >
>
> Yeah, due to these reasons, I think this option is worth considering
> and another plus point is that this allows us to make enable_syncslot
> a PGC_SIGHUP GUC rather than a PGC_POSTMASTER.
>
> >
> > 3) Make slotysnc worker a bgworker. Here we just need to register our
> > process as a bgworker (RegisterBackgroundWorker()) by providing a
> > relevant start_time and restart_time and then the process management
> > is well taken care of. It does not need any code-duplication and
> > allows db-connection smoothly in registered process. The only thing it
> > lacks is that it does not provide flexibility of having
> > start-condition which then makes us to have 'enable_syncslot' as
> > PGC_POSTMASTER parameter rather than PGC_SIGHUP. Having said this, I
> > feel enable_syncslot is something which will not be changed frequently
> > and with the benefits provided by bgworker infra, it seems a
> > reasonably good option to choose this approach.
> >
>
> I agree but it may be better to make it a PGC_SIGHUP parameter.
>
> > 4) Another option is to have Logical Replication Launcher(or a new
> > process) to launch slot-sync worker. But going by the current design
> > where we have only 1 slotsync worker, it may be an overhead to have an
> > additional manager process maintained.
> >
>
> I don't see any good reason to have an additional launcher process here.
>
> >
> > Thus weighing pros and cons of all these options, we have currently
> > implemented the bgworker approach (approach 3).  Any feedback is
> > welcome.
> >
>
> I vote to go for (2) unless we face difficulties in doing so but (3)
> is also okay especially if others also think so.

Okay. Attempted approach 2 as a separate patch in v62-0003. Approach 3
(bgworker) is still  maintained in v62-002.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-01-16 Thread shveta malik
On Wed, Jan 17, 2024 at 6:43 AM Masahiko Sawada  wrote:
>
> On Tue, Jan 16, 2024 at 6:40 PM shveta malik  wrote:
> >
> > On Tue, Jan 16, 2024 at 12:59 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Jan 16, 2024 at 1:07 PM Amit Kapila  
> > > wrote:
> > > >
> > > > On Tue, Jan 16, 2024 at 9:03 AM shveta malik  
> > > > wrote:
> > > > >
> > > > > On Sat, Jan 13, 2024 at 12:54 PM Amit Kapila 
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Jan 12, 2024 at 5:50 PM shveta malik 
> > > > > >  wrote:
> > > > > > >
> > > > > > > There are multiple approaches discussed and tried when it comes to
> > > > > > > starting a slot-sync worker. I am summarizing all here:
> > > > > > >
> > > > > > >  1) Make slotsync worker as an Auxiliary Process (like 
> > > > > > > checkpointer,
> > > > > > > walwriter, walreceiver etc). The benefit this approach provides 
> > > > > > > is, it
> > > > > > > can control begin and stop in a more flexible way as each 
> > > > > > > auxiliary
> > > > > > > process could have different checks before starting and can have
> > > > > > > different stop conditions. But it needs code duplication for 
> > > > > > > process
> > > > > > > management(start, stop, crash handling, signals etc) and 
> > > > > > > currently it
> > > > > > > does not support db-connection smoothly (none of the auxiliary 
> > > > > > > process
> > > > > > > has one so far)
> > > > > > >
> > > > > >
> > > > > > As slotsync worker needs to perform transactions and access 
> > > > > > syscache,
> > > > > > we can't make it an auxiliary process as that doesn't initialize the
> > > > > > required stuff like syscache. Also, see the comment "Auxiliary
> > > > > > processes don't run transactions ..." in AuxiliaryProcessMain() 
> > > > > > which
> > > > > > means this is not an option.
> > > > > >
> > > > > > >
> > > > > > > 2) Make slotsync worker as a 'special' process like 
> > > > > > > AutoVacLauncher
> > > > > > > which is neither an Auxiliary process nor a bgworker one. It 
> > > > > > > allows
> > > > > > > db-connection and also provides flexibility to have start and stop
> > > > > > > conditions for a process.
> > > > > > >
> > > > > >
> > > > > > Yeah, due to these reasons, I think this option is worth considering
> > > > > > and another plus point is that this allows us to make 
> > > > > > enable_syncslot
> > > > > > a PGC_SIGHUP GUC rather than a PGC_POSTMASTER.
> > > > > >
> > > > > > >
> > > > > > > 3) Make slotysnc worker a bgworker. Here we just need to register 
> > > > > > > our
> > > > > > > process as a bgworker (RegisterBackgroundWorker()) by providing a
> > > > > > > relevant start_time and restart_time and then the process 
> > > > > > > management
> > > > > > > is well taken care of. It does not need any code-duplication and
> > > > > > > allows db-connection smoothly in registered process. The only 
> > > > > > > thing it
> > > > > > > lacks is that it does not provide flexibility of having
> > > > > > > start-condition which then makes us to have 'enable_syncslot' as
> > > > > > > PGC_POSTMASTER parameter rather than PGC_SIGHUP. Having said 
> > > > > > > this, I
> > > > > > > feel enable_syncslot is something which will not be changed 
> > > > > > > frequently
> > > > > > > and with the benefits provided by bgworker infra, it seems a
> > > > > > > reasonably good option to choose this approach.
> > > > > > >
> > > > > >
> > > > > > I agree but it may be better to make it a PGC_SIGHUP parameter.
> > > > > >
> > > 

Re: Synchronizing slots from primary to standby

2024-01-17 Thread shveta malik
On Wed, Jan 17, 2024 at 3:08 PM Bertrand Drouvot
 wrote:
>
> Hi,
>
> On Tue, Jan 16, 2024 at 05:27:05PM +0530, shveta malik wrote:
> > PFA v62. Details:
>
> Thanks!
>
> > v62-003:
> > It is a new patch which attempts to implement slot-sync worker as a
> > special process which is neither a bgworker nor an Auxiliary process.
> > Here we get the benefit of converting enable_syncslot to a PGC_SIGHUP
> > Guc rather than PGC_POSTMASTER. We launch the slot-sync worker only if
> > it is hot-standby and 'enable_syncslot' is ON.
>
> The implementation looks reasonable to me (from what I can see some parts is
> copy/paste from an already existing "special" process and some parts are
> "sync slot" specific) which makes fully sense.

Thanks for the feedback. I have addressed the comments in v63 except 5th one.

> A few remarks:
>
> 1 ===
> +* Was it the slot sycn worker?
>
> Typo: sycn
>
> 2 ===
> +* ones), and no walwriter, autovac launcher or bgwriter or 
> slot sync
>
> Instead? "* ones), and no walwriter, autovac launcher, bgwriter or slot sync"
>
> 3 ===
> + * restarting slot slyc worker. If stopSignaled is set, the worker will
>
> Typo: slyc
>
> 4 ===
> +/* Flag to tell if we are in an slot sync worker process */
>
> s/an/a/ ?
>
> 5 === (coming from v62-0002)
> +   Assert(tuplestore_tuple_count(res->tuplestore) == 1);
>
> Is it even possible for the related query to not return only one row? (I 
> think the
> "count" ensures it).

I think you are right. This assertion was added sometime back on the
basis of feedback on hackers. Let me review that again. I can consider
this comment in the next version.

> 6 ===
> if (conninfo_changed ||
> primary_slotname_changed ||
> +   old_enable_syncslot != enable_syncslot ||
> (old_hot_standby_feedback != hot_standby_feedback))
> {
> ereport(LOG,
> errmsg("slot sync worker will restart because 
> of"
>" a parameter change"));
>
> I don't think "slot sync worker will restart" is true if one change 
> enable_syncslot
> from on to off.

Yes, right. I have changed the log-msg in this specific case.

>
> IMHO, v62-003 is in good shape and could be merged in v62-002 (that would ease
> the review). But let's wait to see if others think differently.
>
> Regards,
>
> --
> Bertrand Drouvot
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com




Re: Synchronizing slots from primary to standby

2024-01-18 Thread shveta malik
On Thu, Jan 18, 2024 at 10:31 AM Peter Smith  wrote:
>
> I have one question about the new code in v63-0002.
>
> ==
> src/backend/replication/logical/slotsync.c
>
> 1. ReplSlotSyncWorkerMain
>
> + Assert(SlotSyncWorker->pid == InvalidPid);
> +
> + /*
> + * Startup process signaled the slot sync worker to stop, so if meanwhile
> + * postmaster ended up starting the worker again, exit.
> + */
> + if (SlotSyncWorker->stopSignaled)
> + {
> + SpinLockRelease(&SlotSyncWorker->mutex);
> + proc_exit(0);
> + }
>
> Can we be sure a worker crash can't occur (in ShutDownSlotSync?) in
> such a way that SlotSyncWorker->stopSignaled was already assigned
> true, but SlotSyncWorker->pid was not yet reset to InvalidPid;
>
> e.g. Is the Assert above still OK?

We are good with the Assert here. I tried below cases:

1) When slotsync worker is say killed using 'kill', it is considered
as SIGTERM; slot sync worker invokes 'slotsync_worker_onexit()' before
going down and thus sets SlotSyncWorker->pid = InvalidPid. This means
when it is restarted (considering we have put the breakpoints in such
a way that postmaster had already reached do_start_bgworker() before
promotion finished), it is able to see stopSignaled set but pid is
InvalidPid and thus we are good.

2) Another case is when we kill slot sync worker using 'kill -9' (or
say we make it crash), in such a case, postmaster signals each sibling
process to quit (including startup process) and cleans up the shared
memory used by each (including SlotSyncWorker). In such a case
promotion fails. And if slot sync worker is started again, it will
find pid as InvalidPid. So we are good.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-01-18 Thread shveta malik
On Fri, Jan 19, 2024 at 11:23 AM Amit Kapila  wrote:
>
> > > 5 === (coming from v62-0002)
> > > +   Assert(tuplestore_tuple_count(res->tuplestore) == 1);
> > >
> > > Is it even possible for the related query to not return only one row? (I 
> > > think the
> > > "count" ensures it).
> >
> > I think you are right. This assertion was added sometime back on the
> > basis of feedback on hackers. Let me review that again. I can consider
> > this comment in the next version.
> >
>
> OTOH, can't we keep the assert as it is but remove "= 1" from
> "count(*) = 1" in the query. There shouldn't be more than one slot
> with same name on the primary. Or, am I missing something?

There will be 1 record max and 0 record if the primary_slot_name is
invalid. Keeping 'count(*)=1' gives the benefit that it will straight
away give us true/false indicating if we are good or not wrt
primary_slot_name. I feel Assert can be removed and we can simply
have:

if (!tuplestore_gettupleslot(res->tuplestore, true, false, tupslot))
elog(ERROR, "failed to fetch primary_slot_name tuple");

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-01-19 Thread shveta malik
On Fri, Jan 19, 2024 at 10:35 AM Masahiko Sawada  wrote:
>
>
> Thank you for updating the patch. I have some comments:
>
> ---
> +latestWalEnd = GetWalRcvLatestWalEnd();
> +if (remote_slot->confirmed_lsn > latestWalEnd)
> +{
> +elog(ERROR, "exiting from slot synchronization as the
> received slot sync"
> + " LSN %X/%X for slot \"%s\" is ahead of the
> standby position %X/%X",
> + LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
> + remote_slot->name,
> + LSN_FORMAT_ARGS(latestWalEnd));
> +}
>
> IIUC GetWalRcvLatestWalEnd () returns walrcv->latestWalEnd, which is
> typically the primary server's flush position and doesn't mean the LSN
> where the walreceiver received/flushed up to.

yes. I think it makes more sense to use something which actually tells
flushed-position. I gave it a try by replacing GetWalRcvLatestWalEnd()
with GetWalRcvFlushRecPtr() but I see a problem here. Lets say I have
enabled the slot-sync feature in a running standby, in that case we
are all good (flushedUpto is the same as actual flush-position
indicated by LogstreamResult.Flush). But if I restart standby, then I
observed that the startup process sets flushedUpto to some value 'x'
(see [1]) while when the wal-receiver starts, it sets
'LogstreamResult.Flush' to another value (see [2]) which is always
greater than 'x'. And we do not update flushedUpto with the
'LogstreamResult.Flush' value in walreceiver until we actually do an
operation on primary. Performing a data change on primary sends WALs
to standby which then hits XLogWalRcvFlush() and updates flushedUpto
same as LogstreamResult.Flush. Until then we have a situation where
slots received on standby are ahead of flushedUpto and thus slotsync
worker keeps one erroring out. I am yet to find out why flushedUpto is
set to a lower value than 'LogstreamResult.Flush' at the start of
standby.  Or maybe am I using the wrong function
GetWalRcvFlushRecPtr() and should be using something else instead?

[1]:
Startup process sets 'flushedUpto' here:
ReadPageInternal-->XLogPageRead-->WaitForWALToBecomeAvailable-->RequestXLogStreaming

[2]:
Walreceiver sets 'LogstreamResult.Flush' here but do not update
'flushedUpto' here:
WalReceiverMain():  LogstreamResult.Write = LogstreamResult.Flush =
GetXLogReplayRecPtr(NULL)


> Does it really happen
> that the slot's confirmed_flush_lsn is higher than the primary's flush
> lsn?

It may happen if we have not configured standby_slot_names on primary.
In such a case, slots may get updated w/o confirming that standby has
taken the change and thus slot-sync worker may fetch the slots which
have lsns ahead of the latest WAL position on standby.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-01-19 Thread shveta malik
On Thu, Jan 18, 2024 at 4:49 PM Amit Kapila  wrote:

> 2.
> +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot)
> {
> ...
> + LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
> + xmin_horizon = GetOldestSafeDecodingTransactionId(true);
> + SpinLockAcquire(&slot->mutex);
> + slot->data.catalog_xmin = xmin_horizon;
> + SpinLockRelease(&slot->mutex);
> ...
> }
>
> Here, why slot->effective_catalog_xmin is not updated? The same is
> required by a later call to ReplicationSlotsComputeRequiredXmin(). I
> see that the prior version v60-0002 has the corresponding change but
> it is missing in the latest version. Any reason?

I think it was a mistake in v61. Added it back in v64..

>
> 3.
> + * Return true either if the slot is marked as RS_PERSISTENT (sync-ready) or
> + * is synced periodically (if it was already sync-ready). Return false
> + * otherwise.
> + */
> +static bool
> +update_and_persist_slot(RemoteSlot *remote_slot)
>
> The second part of the above comment (or is synced periodically (if it
> was already sync-ready)) is not clear to me. Does it intend to
> describe the case when we try to update the already created temp slot
> in the last call. If so, that is not very clear because periodically
> sounds like it can be due to repeated sync for sync-ready slot.

The comment was as per old functionality where this function was doing
persist and save both. In v61 code changed, but comment was not
updated. I have changed it now in v64.

> 4.
> +update_and_persist_slot(RemoteSlot *remote_slot)
> {
> ...
> + (void) local_slot_update(remote_slot);
> ...
> }
>
> Can we write a comment to state the reason why we don't care about the
> return value here?

Since it is the first time 'local_slot_update' is happening on any
slot, the return value must be true i.e. local_slot_update() should
not skip the update. I have thus added an Assert on return value now
(in v64).

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-01-22 Thread shveta malik
On Mon, Jan 22, 2024 at 12:28 PM Amit Kapila  wrote:
>
> On Fri, Jan 19, 2024 at 3:55 PM shveta malik  wrote:
> >
> > On Fri, Jan 19, 2024 at 10:35 AM Masahiko Sawada  
> > wrote:
> > >
> > >
> > > Thank you for updating the patch. I have some comments:
> > >
> > > ---
> > > +latestWalEnd = GetWalRcvLatestWalEnd();
> > > +if (remote_slot->confirmed_lsn > latestWalEnd)
> > > +{
> > > +elog(ERROR, "exiting from slot synchronization as the
> > > received slot sync"
> > > + " LSN %X/%X for slot \"%s\" is ahead of the
> > > standby position %X/%X",
> > > + LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
> > > + remote_slot->name,
> > > + LSN_FORMAT_ARGS(latestWalEnd));
> > > +}
> > >
> > > IIUC GetWalRcvLatestWalEnd () returns walrcv->latestWalEnd, which is
> > > typically the primary server's flush position and doesn't mean the LSN
> > > where the walreceiver received/flushed up to.
> >
> > yes. I think it makes more sense to use something which actually tells
> > flushed-position. I gave it a try by replacing GetWalRcvLatestWalEnd()
> > with GetWalRcvFlushRecPtr() but I see a problem here. Lets say I have
> > enabled the slot-sync feature in a running standby, in that case we
> > are all good (flushedUpto is the same as actual flush-position
> > indicated by LogstreamResult.Flush). But if I restart standby, then I
> > observed that the startup process sets flushedUpto to some value 'x'
> > (see [1]) while when the wal-receiver starts, it sets
> > 'LogstreamResult.Flush' to another value (see [2]) which is always
> > greater than 'x'. And we do not update flushedUpto with the
> > 'LogstreamResult.Flush' value in walreceiver until we actually do an
> > operation on primary. Performing a data change on primary sends WALs
> > to standby which then hits XLogWalRcvFlush() and updates flushedUpto
> > same as LogstreamResult.Flush. Until then we have a situation where
> > slots received on standby are ahead of flushedUpto and thus slotsync
> > worker keeps one erroring out. I am yet to find out why flushedUpto is
> > set to a lower value than 'LogstreamResult.Flush' at the start of
> > standby.  Or maybe am I using the wrong function
> > GetWalRcvFlushRecPtr() and should be using something else instead?
> >
>
> Can we think of using GetStandbyFlushRecPtr()? We probably need to
> expose this function, if this works for the required purpose.

I think we can. For the records, the problem while using flushedUpto
(or GetWalRcvFlushRecPtr()) directly is that it is not set to the
latest flushed position immediately after startup.  It points to some
prior location (perhaps segment or page start) after startup until
some data  is flushed next which then updates it to the latest flushed
position, thus we can not use it directly.  GetStandbyFlushRecPtr()
OTOH takes care of it i.e. it returns correct flushed-location at any
point of time. I have changed v65 to use this one.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-01-22 Thread shveta malik
On Fri, Jan 19, 2024 at 11:48 AM Peter Smith  wrote:
>
> Here are some review comments for patch v63-0003.

Thanks Peter. I have addressed all in v65.

>
> 4b.
> It was a bit different when there were ERRORs but now they are LOGs
> somehow it seems wrong for this function to say what the *caller* will
> do. Maybe you can rewrite all the errmsg so the don't say "skipping"
> but they just say "bad configuration for slot synchronization"
>
> If valid is false then you can LOG "skipping" at the caller...

I have made this change but now in the log file we see 3 logs like
below, does it seem apt? Was the earlier one better where we get the
info in 2 lines?

[34416] LOG:  bad configuration for slot synchronization
[34416] HINT:  hot_standby_feedback must be enabled.
[34416] LOG:  skipping slot synchronization

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-01-23 Thread shveta malik
On Tue, Jan 23, 2024 at 9:45 AM Peter Smith  wrote:
>
> Here are some review comments for v65-0002

Thanks Peter for the feedback. I have addressed these in v66.

>
> 4. GetStandbyFlushRecPtr
>
>  /*
> - * Returns the latest point in WAL that has been safely flushed to disk, and
> - * can be sent to the standby. This should only be called when in recovery,
> - * ie. we're streaming to a cascaded standby.
> + * Returns the latest point in WAL that has been safely flushed to disk.
> + * This should only be called when in recovery.
> + *
>
> Since it says "This should only be called when in recovery", should
> there also be a check for that (e.g. RecoveryInProgress) in the added
> Assert?

Since 'am_cascading_walsender' and  'IsLogicalSlotSyncWorker' makes
sense 'in-recovery' only, I think explicit check for
'RecoveryInProgress' is not needed here. But I can add if others also
think it is needed.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-01-24 Thread shveta malik
On Wed, Jan 24, 2024 at 2:38 PM Bertrand Drouvot
 wrote:
>
> Hi,
>
> On Wed, Jan 24, 2024 at 01:51:54PM +0530, Amit Kapila wrote:
> > On Wed, Jan 24, 2024 at 11:24 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Wed, Jan 24, 2024 at 2:43 PM Amit Kapila  
> > > wrote:
> > > >
> > > > >
> > > > > +/* GUC variable */
> > > > > +bool   enable_syncslot = false;
> > > > >
> > > > > Is enable_syncslot a really good name? We use "enable" prefix only for
> > > > > planner parameters such as enable_seqscan, and it seems to me that
> > > > > "slot" is not specific. Other candidates are:
> > > > >
> > > > > * synchronize_replication_slots = on|off
> > > > > * synchronize_failover_slots = on|off
> > > > >
> > > >
> > > > I would prefer the second one. Would it be better to just say
> > > > sync_failover_slots?
> > >
> > > Works for me. But if we want to extend this option for non-failover
> > > slots as well in the future, synchronize_replication_slots (or
> > > sync_replication_slots) seems better. We can extend it by having an
> > > enum later. For example, the values can be on, off, or failover etc.
> > >
> >
> > I see your point. Let us see if others have any suggestions on this.
>
> I also see Sawada-San's point and I'd vote for "sync_replication_slots". Then 
> for
> the current feature I think "failover" and "on" should be the values to turn 
> the
> feature on (assuming "on" would mean "all kind of supported slots").

Even if others agree and we change this GUC name to
"sync_replication_slots", I feel we should keep the values as "on" and
"off" currently, where "on" would mean 'sync failover slots' (docs can
state that clearly).  I do not think we should support sync of "all
kinds of supported slots" in the first version. Maybe we can think
about it for future versions.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-01-24 Thread shveta malik
On Thu, Jan 25, 2024 at 9:13 AM Amit Kapila  wrote:
>
> > 3) Removed syncing 'failover' on standby from remote_slot. The
> > 'failover' field will be false for synced slots. Since we do not
> > support sync to cascading standbys yet, thus failover=true was
> > misleading and unused there.
> >
>
> But what will happen after the standby is promoted? After promotion,
> ideally, it should have failover enabled, so that the slots can be
> synced. Also, note that corresponding subscriptions still have the
> failover flag enabled. I think we should copy the 'failover' option
> for the synced slots.

Yes, right, missed this point earlier. I will make the change in the
next version.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-01-25 Thread shveta malik
On Thu, Jan 25, 2024 at 10:39 AM Peter Smith  wrote:

> 2. synchronize_one_slot
>
> + /*
> + * Sanity check: Make sure that concerned WAL is received and flushed
> + * before syncing slot to target lsn received from the primary server.
> + *
> + * This check should never pass as on the primary server, we have waited
> + * for the standby's confirmation before updating the logical slot.
> + */
> + latestFlushPtr = GetStandbyFlushRecPtr(NULL);
> + if (remote_slot->confirmed_lsn > latestFlushPtr)
> + {
> + ereport(LOG,
> + errmsg("skipping slot synchronization as the received slot sync"
> +" LSN %X/%X for slot \"%s\" is ahead of the standby position %X/%X",
> +LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
> +remote_slot->name,
> +LSN_FORMAT_ARGS(latestFlushPtr)));
> +
> + return false;
> + }
>
> Previously in v65 this was an elog, but now it is an ereport. But
> since this is a sanity check condition that "should never pass" wasn't
> the elog the more appropriate choice?

We realized that this scenario can be frequently hit when the user has
not set standby_slot_names on primary. And thus ereport makes more
sense. But I agree that this comment is misleading. We will adjust the
comment in the next version.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-01-30 Thread shveta malik
On Tue, Jan 30, 2024 at 11:31 AM Amit Kapila  wrote:
>
> In this regard, I feel we don't need to dump/restore the 'FAILOVER'
> option non-binary upgrade paths similar to the 'ENABLE' option. For
> binary upgrade, if the failover option is enabled, then we can enable
> it using Alter Subscription SET (failover=true). Let's add one test
> corresponding to this behavior in
> postgresql\src\bin\pg_upgrade\t\004_subscription.

Changed pg_dump behaviour as suggested and added additional test.

> Additionally, we need to update the pg_dump docs for the 'failover'
> option. See "When dumping logical replication subscriptions, .." [1].
> I think we also need to update the connect option docs in CREATE
> SUBSCRIPTION [2].

Updated docs.

> [1] - https://www.postgresql.org/docs/devel/app-pgdump.html
> [2] - https://www.postgresql.org/docs/devel/sql-createsubscription.html

PFA v73-0001 which addresses the above comments. Other patches will be
rebased and posted after pushing this one. Thanks Hou-San for adding
pg_upgrade test for failover.

thanks
Shveta


v73-0001-Add-a-failover-option-to-subscriptions.patch
Description: Binary data


Re: Synchronizing slots from primary to standby

2024-02-01 Thread shveta malik
On Thu, Feb 1, 2024 at 11:21 AM Peter Smith  wrote:
>
> Here are some review comments for v740001.

Thanks Peter for the feedback.

> ==
> src/sgml/logicaldecoding.sgml
>
> 1.
> +   
> +Replication Slot Synchronization
> +
> + A logical replication slot on the primary can be synchronized to the hot
> + standby by enabling the failover option during slot
> + creation and setting
> +  linkend="guc-enable-syncslot">enable_syncslot
> + on the standby. For the synchronization
> + to work, it is mandatory to have a physical replication slot between the
> + primary and the standby, and
> +  linkend="guc-hot-standby-feedback">hot_standby_feedback
> + must be enabled on the standby. It is also necessary to specify a valid
> + dbname in the
> +  linkend="guc-primary-conninfo">primary_conninfo
> + string, which is used for slot synchronization and is ignored
> for streaming.
> +
>
> IMO we don't need to repeat that last part ", which is used for slot
> synchronization and is ignored for streaming." because that is a
> detail about the primary_conninfo GUC, and the same information is
> already described in that GUC section.

Modified in v75.

> ==
>
> 2. ALTER_REPLICATION_SLOT slot_name ( option [, ...] ) #
>
>   
> -  If true, the slot is enabled to be synced to the standbys.
> +  If true, the slot is enabled to be synced to the standbys
> +  so that logical replication can be resumed after failover.
>   
>
> This also should have the sentence "The default is false.", e.g. the
> same as the same option in CREATE_REPLICATION_SLOT says.

I have not added this. I feel the default value related details should
be present in the 'CREATE' part, it is not meaningful for the "ALTER"
part. ALTER does not have any defaults, it just modifies the options
given by the user.

> ==
> synchronize_one_slot
>
> 3.
> + /*
> + * Make sure that concerned WAL is received and flushed before syncing
> + * slot to target lsn received from the primary server.
> + *
> + * This check will never pass if on the primary server, user has
> + * configured standby_slot_names GUC correctly, otherwise this can hit
> + * frequently.
> + */
> + latestFlushPtr = GetStandbyFlushRecPtr(NULL);
> + if (remote_slot->confirmed_lsn > latestFlushPtr)
>
> BEFORE
> This check will never pass if on the primary server, user has
> configured standby_slot_names GUC correctly, otherwise this can hit
> frequently.
>
> SUGGESTION (simpler way to say the same thing?)
> This will always be the case unless the standby_slot_names GUC is not
> correctly configured on the primary server.

It is not true. It will not hit this condition "always" but has higher
chances to hit it when standby_slot_names is not configured. I think
you meant 'unless the standby_slot_names GUC is correctly configured'.
I feel the current comment gives clear info (less confusing) and thus
I have not changed it for the time being. I can consider if I get more
comments there.

> 4.
> + /* User created slot with the same name exists, raise ERROR. */
>
> /User created/User-created/

Modified.

> ~~~
>
> 5. synchronize_slots, and also drop_obsolete_slots
>
> + /*
> + * Use shared lock to prevent a conflict with
> + * ReplicationSlotsDropDBSlots(), trying to drop the same slot while
> + * drop-database operation.
> + */
>
> (same code comment is in a couple of places)
>
> SUGGESTION (while -> during, etc.)
>
> Use a shared lock to prevent conflicting with
> ReplicationSlotsDropDBSlots() trying to drop the same slot during a
> drop-database operation.

Modified.

> ~~~
>
> 6. validate_parameters_and_get_dbname
>
> strcmp() just for the empty string "" might be overkill.
>
> 6a.
> + if (PrimarySlotName == NULL || strcmp(PrimarySlotName, "") == 0)
>
> SUGGESTION
> if (PrimarySlotName == NULL || *PrimarySlotName == '\0')
>
> ~~
>
> 6b.
> + if (PrimaryConnInfo == NULL || strcmp(PrimaryConnInfo, "") == 0)
>
> SUGGESTION
> if (PrimaryConnInfo == NULL || *PrimaryConnInfo == '\0')

Modified.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-01 Thread shveta malik
On Wed, Jan 31, 2024 at 2:02 PM Masahiko Sawada  wrote:
>
> ---
> +static char *
> +wait_for_valid_params_and_get_dbname(void)
> +{
> +   char   *dbname;
> +   int rc;
> +
> +   /* Sanity check. */
> +   Assert(enable_syncslot);
> +
> +   for (;;)
> +   {
> +   if (validate_parameters_and_get_dbname(&dbname))
> +   break;
> +   ereport(LOG, errmsg("skipping slot synchronization"));
> +
> +   ProcessSlotSyncInterrupts(NULL);
>
> When reading this function, I expected that the slotsync worker would
> resume working once the parameters became valid, but it was not
> correct. For example, if I changed hot_standby_feedback from off to
> on, the slotsync worker reads the config file, exits, and then
> restarts. Given that the slotsync worker ends up exiting on parameter
> changes anyway, why do we want to have it wait for parameters to
> become valid? IIUC even if the slotsync worker exits when a parameter
> is not valid, it restarts at some intervals.

Thanks for the feedback Changed this functionality in v75. Now we do
not exit in wait_for_valid_params_and_get_dbname() on GUC change. We
re-validate the new values and if found valid, carry on with
slot-syncing else continue waiting.

> ---
> +bool
> +SlotSyncWorkerCanRestart(void)
> +{
> +#define SLOTSYNC_RESTART_INTERVAL_SEC 10
> +
>
> IIUC depending on how busy the postmaster is and the timing, the user
> could wait for 1 min to re-launch the slotsync worker. But I think the
> user might want to re-launch the slotsync worker more quickly for
> example when the slotsync worker restarts due to parameter changes.
> IIUC SloSyncWorkerCanRestart() doesn't consider the fact that the
> slotsync worker previously exited with 0 or 1.

Modified this in v75. As you suggested in [1], we reset
last_start_time on GUC change before proc_exit, so that the postmaster
restarts worker immediately without waiting.

> ---
> +   /* We are a normal standby */
> +   valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull));
> +   Assert(!isnull);
>
> What do you mean by "normal standby"?
>
> ---
> +   appendStringInfo(&cmd,
> +"SELECT pg_is_in_recovery(), count(*) = 1"
> +" FROM pg_replication_slots"
> +" WHERE slot_type='physical' AND slot_name=%s",
> +quote_literal_cstr(PrimarySlotName));
>
> I think we need to make "pg_replication_slots" schema-qualified.

Modified.

> ---
> +   errdetail("The primary server slot \"%s\" specified by"
> + " \"%s\" is not valid.",
> + PrimarySlotName, "primary_slot_name"));
>
> and
>
> +   errmsg("slot sync worker will shutdown because"
> +  " %s is disabled", "enable_syncslot"));
>
> It's better to write it in one line for better greppability.

Modified.

[1]: 
https://www.postgresql.org/message-id/CAD21AoAv6FwZ6UPNTj6%3D7A%2B3O2m4utzfL8ZGS6X1EGexikG66A%40mail.gmail.com

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-02 Thread shveta malik
On Fri, Feb 2, 2024 at 12:25 PM Peter Smith  wrote:
>
> Here are some review comments for v750002.

Thanks for the feedback Peter. Addressed all in v76 except one.

> (this is a WIP but this is what I found so far...)

> I wonder if it is better to log all the problems in one go instead of
> making users stumble onto them one at a time after fixing one and then
> hitting the next problem. e.g. just set some variable "all_ok =
> false;" each time instead of all the "return false;"
>
> Then at the end of the function just "return all_ok;"

If we do this way, then we need to find a way to combine the msgs as
well, otherwise the same msg will be repeated multiple times. For the
concerned functionality (which needs one time config effort by user),
I feel the existing way looks okay. We may consider optimizing it if
we get more comments here.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-05 Thread shveta malik
On Mon, Feb 5, 2024 at 4:36 PM Amit Kapila  wrote:
>
> I have pushed the first patch. Next, a few comments on 0002 are as follows:

Thanks for the feedback Amit. Some of these are addressed in v78. Rest
will be addressed in the next version.

> 1.
> +static bool
> +validate_parameters_and_get_dbname(char **dbname, int elevel)
>
> For 0002, we don't need dbname as out parameter. Also, we can rename
> the function to validate_slotsync_params() or something like that.
> Also, for 0003, we don't need to get the dbname from
> wait_for_valid_params_and_get_dbname(), instead there could be a
> common function that can be invoked from validate_slotsync_params()
> and caller of wait function that caches the value of dbname.
>
> The other parameter elevel is also not required for 0002.
>
> 2.
> + /*
> + * Make sure that concerned WAL is received and flushed before syncing
> + * slot to target lsn received from the primary server.
> + */
> + latestFlushPtr = GetStandbyFlushRecPtr(NULL);
> + if (remote_slot->confirmed_lsn > latestFlushPtr)
> + {
> + /*
> + * Can get here only if GUC 'standby_slot_names' on the primary server
> + * was not configured correctly.
> + */
> + ereport(LOG,
> + errmsg("skipping slot synchronization as the received slot sync"
> +" LSN %X/%X for slot \"%s\" is ahead of the standby position %X/%X",
> +LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
> +remote_slot->name,
> +LSN_FORMAT_ARGS(latestFlushPtr)));
> +
> + return false;
>
> In the case of a function invocation, this should be an ERROR. We can
> move the comment related to 'standby_slot_names' to a later patch
> where that GUC is introduced. See, if there are other LOGs in the
> patch that needs to be converted to ERROR.
>
> 3. The function pg_sync_replication_slots() should be in file
> slotfuncs.c and common functionality between this function and
> slotsync worker can be exposed via a function in slotsync.c.
>
> 4.
> /*
> + * Using the specified primary server connection, check whether we are
> + * cascading standby and validates primary_slot_name for
> + * non-cascading-standbys.
> + */
> + check_primary_info(wrconn, &am_cascading_standby,
> +&primary_slot_invalid, ERROR);
> +
> + if (am_cascading_standby)
> + ereport(ERROR,
> + errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("cannot synchronize replication slots to a cascading standby"));
>
> primary_slot_invalid is not used in this patch. I think we can allow
> the function can be executed on cascading_standby as well because this
> will be used for the planned switchover.
>
> 5. I don't see any problem with allowing concurrent processes trying
> to sync the same slot at the same time as each process will acquire
> the slot and only one process can acquire the slot at a time, the
> other will get an ERROR.
>
> --
> With Regards,
> Amit Kapila.




Re: Synchronizing slots from primary to standby

2024-02-06 Thread shveta malik
On Tue, Feb 6, 2024 at 7:21 PM Zhijie Hou (Fujitsu)
 wrote:
>
> > ---
> > +/* Slot sync worker objects */
> > +extern PGDLLIMPORT char *PrimaryConnInfo; extern PGDLLIMPORT char
> > +*PrimarySlotName;
> >
> > These two variables are declared also in xlogrecovery.h. Is it intentional? 
> > If so, I
> > think it's better to write comments.
>
> Will address.

Added comments in v79_2.

>
> >
> > ---
> > +   SELECT r.srsubid AS subid, CONCAT('pg_' || srsubid ||
> > '_sync_' || srrelid || '_' || ctl.system_identifier) AS slotname
> >
> > and
> >
> > +   SELECT (CASE WHEN r.srsubstate = 'f' THEN
> > pg_replication_origin_progress(CONCAT('pg_' || r.srsubid || '_' || 
> > r.srrelid), false)
> >
> > If we use CONCAT function, we can replace '||' with ','.
> >

Modified in v79_2.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-07 Thread shveta malik
On Tue, Feb 6, 2024 at 12:25 PM Bertrand Drouvot
 wrote:
>
> Hi,

> That said, I still think the commit message needs some re-wording, what about?
>
> =
> If a logical slot on the primary is valid but is invalidated on the standby,
> then that slot is dropped and can be recreated on the standby in next
> pg_sync_replication_slots() call provided the slot still exists on the primary
> server. It is okay to recreate such slots as long as these are not consumable
> on the standby (which is the case currently). This situation may occur due to
> the following reasons:
>
> - The max_slot_wal_keep_size on the standby is insufficient to retain WAL
>   records from the restart_lsn of the slot.
> - primary_slot_name is temporarily reset to null and the physical slot is
>   removed.
>
> Changing the primary wal_level to a level lower than logical is only possible
> if the logical slots are removed on the primary, so it's expected to see
> the slots being removed on the standby too (and re-created if they are
> re-created on the primary).
> =

Thanks for the feedback. I have incorporated the suggestions in v80.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-08 Thread shveta malik
On Thu, Feb 8, 2024 at 12:08 PM Peter Smith  wrote:
>
> Here are some review comments for patch v80_2-0001.

Thanks for the feedback Peter. Addressed the comments in v81.
Attached patch001 for early feedback. Rest of the patches need
rebasing and thus will post those later.

It also addresses comments by Amit in [1].

[1]: 
https://www.postgresql.org/message-id/CAA4eK1Ldhh_kf-qG-m5BKY0R1SkdBSx5j%2BEzwpie%2BH9GPWWOYA%40mail.gmail.com

thanks
Shveta


v81-0001-Add-a-slot-synchronization-function.patch
Description: Binary data


Re: Synchronizing slots from primary to standby

2024-02-08 Thread shveta malik
On Thu, Feb 8, 2024 at 4:03 PM Amit Kapila  wrote:
>
> Few comments on 0001
> ===

Thanks Amit. Addressed these in v81.

> 1.
> + * the slots on the standby and synchronize them. This is done on every call
> + * to SQL function pg_sync_replication_slots.
> >
>
> I think the second sentence can be slightly changed to: "This is done
> by a call to SQL function pg_sync_replication_slots." or "One can call
> SQL function pg_sync_replication_slots to invoke this functionality."

Done.

> 2.
> +update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
> {
> ...
> + SpinLockAcquire(&slot->mutex);
> + slot->data.plugin = plugin_name;
> + slot->data.database = remote_dbid;
> + slot->data.two_phase = remote_slot->two_phase;
> + slot->data.failover = remote_slot->failover;
> + slot->data.restart_lsn = remote_slot->restart_lsn;
> + slot->data.confirmed_flush = remote_slot->confirmed_lsn;
> + slot->data.catalog_xmin = remote_slot->catalog_xmin;
> + slot->effective_catalog_xmin = remote_slot->catalog_xmin;
> + SpinLockRelease(&slot->mutex);
> +
> + if (remote_slot->catalog_xmin != slot->data.catalog_xmin)
> + ReplicationSlotsComputeRequiredXmin(false);
> +
> + if (remote_slot->restart_lsn != slot->data.restart_lsn)
> + ReplicationSlotsComputeRequiredLSN();
> ...
> }
>
> How is it possible that after assigning the values from remote_slot
> they can differ from local slot values?

It was a mistake while comment fixing in previous versions. Corrected
it now. Thanks for catching.

> 3.
> + /*
> + * Find the oldest existing WAL segment file.
> + *
> + * Normally, we can determine it by using the last removed segment
> + * number. However, if no WAL segment files have been removed by a
> + * checkpoint since startup, we need to search for the oldest segment
> + * file currently existing in XLOGDIR.
> + */
> + oldest_segno = XLogGetLastRemovedSegno() + 1;
> +
> + if (oldest_segno == 1)
> + oldest_segno = XLogGetOldestSegno(0);
>
> I feel this way isn't there a risk that XLogGetOldestSegno() will get
> us the seg number from some previous timeline which won't make sense
> to compare segno in reserve_wal_for_local_slot. Shouldn't you need to
> fetch the current timeline and send as a parameter to this function as
> that is the timeline on which standby is communicating with primary.

Yes, modified it.

> 4.
> + if (remote_slot->confirmed_lsn > latestFlushPtr)
> + ereport(ERROR,
> + errmsg("skipping slot synchronization as the received slot sync"
>
> I think the internal errors should be reported with elog as you have
> done at other palces in the patch.

Done.

> 5.
> +synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
> {
> ...
> + /*
> + * Copy the invalidation cause from remote only if local slot is not
> + * invalidated locally, we don't want to overwrite existing one.
> + */
> + if (slot->data.invalidated == RS_INVAL_NONE)
> + {
> + SpinLockAcquire(&slot->mutex);
> + slot->data.invalidated = remote_slot->invalidated;
> + SpinLockRelease(&slot->mutex);
> +
> + /* Make sure the invalidated state persists across server restart */
> + ReplicationSlotMarkDirty();
> + ReplicationSlotSave();
> + slot_updated = true;
> + }
> ...
> }
>
> Do we need to copy the 'invalidated' from remote to local if both are
> same? I think this will happen for each slot each time because
> normally slots won't be invalidated ones, so there is needless writes.

It is not needed everytime. Optimized it. Now we copy only if
local_slot's 'invalidated' value is RS_INVAL_NONE while remote-slot's
value != RS_INVAL_NONE.

> 6.
> + * Returns TRUE if any of the slots gets updated in this sync-cycle.
> + */
> +static bool
> +synchronize_slots(WalReceiverConn *wrconn)
> ...
> ...
>
> +void
> +SyncReplicationSlots(WalReceiverConn *wrconn)
> +{
> + PG_TRY();
> + {
> + validate_primary_slot_name(wrconn);
> +
> + (void) synchronize_slots(wrconn);
>
> For the purpose of 0001, synchronize_slots() doesn't seems to use
> return value. So, I suggest to change it accordingly and move the
> return value in the required patch.

Modified it. Also changed return values of all related internal
functions which were returning slot_updated.

> 7.
> + /*
> + * The primary_slot_name is not set yet or WALs not received yet.
> + * Synchronization is not possible if the walreceiver is not started.
> + */
> + latestWalEnd = GetWalRcvLatestWalEnd();
> + SpinLockAcquire(&WalRcv->mutex);
> + if ((WalRcv->slotname[0] == '\0') ||
> + XLogRecPtrIsInvalid(latestWalEnd))
> + {
> + SpinLockRelease(&WalRcv->mutex);
> + return false;
>
> For the purpose of 0001, we should give WARNING here.

I will fix it in the next version. Sorry, I somehow missed it this time.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-08 Thread shveta malik
On Thu, Feb 8, 2024 at 4:31 PM shveta malik  wrote:
>
> On Thu, Feb 8, 2024 at 12:08 PM Peter Smith  wrote:
> >
> > Here are some review comments for patch v80_2-0001.
>
> Thanks for the feedback Peter. Addressed the comments in v81.

Missed to mention, Hou-san helped in addressing some of these comments
in v81.Thanks Hou-San.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-12 Thread shveta malik
On Tue, Feb 13, 2024 at 6:45 AM Zhijie Hou (Fujitsu)
 wrote:
>
> On Monday, February 12, 2024 5:40 PM Amit Kapila  
> wrote:
>
> Thanks for the comments, I have addressed them.
>
> Here is the new version patch which addressed above and
> most of Bertrand's comments.

Thanks for the patch.

I am trying to run valgrind on patch001. I followed the steps given in
[1]. It ended up generating 850 log files. Is there a way to figure
out that we have a memory related problem w/o going through each log
file manually?

I also tried running the steps with '-leak-check=summary' (in the
first run, it was '-leak-check=no' as suggested in wiki) with and
without the patch and tried comparing those manually for a few of
them. I found o/p more or less the same. But this is a mammoth task if
we have to do it manually for 850 files. So any pointers here?

For reference:

Sample log file with  '-leak-check=no'
==00:00:08:44.321 250746== HEAP SUMMARY:
==00:00:08:44.321 250746== in use at exit: 1,298,274 bytes in 290 blocks
==00:00:08:44.321 250746==   total heap usage: 11,958 allocs, 7,005
frees, 8,175,630 bytes allocated
==00:00:08:44.321 250746==
==00:00:08:44.321 250746== For a detailed leak analysis, rerun with:
--leak-check=full
==00:00:08:44.321 250746==
==00:00:08:44.321 250746== For lists of detected and suppressed
errors, rerun with: -s
==00:00:08:44.321 250746== ERROR SUMMARY: 0 errors from 0 contexts
(suppressed: 0 from 0)


Sample log file with  '-leak-check=summary'
==00:00:00:27.300 265785== HEAP SUMMARY:
==00:00:00:27.300 265785== in use at exit: 1,929,907 bytes in 310 blocks
==00:00:00:27.300 265785==   total heap usage: 71,677 allocs, 7,754
frees, 95,750,897 bytes allocated
==00:00:00:27.300 265785==
==00:00:00:27.394 265785== LEAK SUMMARY:
==00:00:00:27.394 265785==definitely lost: 20,507 bytes in 171 blocks
==00:00:00:27.394 265785==indirectly lost: 16,419 bytes in 61 blocks
==00:00:00:27.394 265785==  possibly lost: 354,670 bytes in 905 blocks
==00:00:00:27.394 265785==still reachable: 592,586 bytes in 1,473 blocks
==00:00:00:27.394 265785== suppressed: 0 bytes in 0 blocks
==00:00:00:27.394 265785== Rerun with --leak-check=full to see details
of leaked memory
==00:00:00:27.394 265785==
==00:00:00:27.394 265785== For lists of detected and suppressed
errors, rerun with: -s
==00:00:00:27.394 265785== ERROR SUMMARY: 0 errors from 0 contexts
(suppressed: 0 from 0)


[1]: https://wiki.postgresql.org/wiki/Valgrind

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-13 Thread shveta malik
On Fri, Feb 9, 2024 at 10:04 AM Amit Kapila  wrote:
>
> +reserve_wal_for_local_slot(XLogRecPtr restart_lsn)
> {
> ...
> + /*
> + * Find the oldest existing WAL segment file.
> + *
> + * Normally, we can determine it by using the last removed segment
> + * number. However, if no WAL segment files have been removed by a
> + * checkpoint since startup, we need to search for the oldest segment
> + * file currently existing in XLOGDIR.
> + */
> + oldest_segno = XLogGetLastRemovedSegno() + 1;
> +
> + if (oldest_segno == 1)
> + {
> + TimeLineID cur_timeline;
> +
> + GetWalRcvFlushRecPtr(NULL, &cur_timeline);
> + oldest_segno = XLogGetOldestSegno(cur_timeline);
> ...
> ...
>
> This means that if the restart_lsn of the slot is from the prior
> timeline then the standby needs to wait for longer times to sync the
> slot. Ideally, it should be okay because I don't think even if
> restart_lsn of the slot may be from some prior timeline than the
> current flush timeline on standby, how often that case can happen?

I tested this behaviour on v85 patch, it is working as expected i.e.
if remot_slot's lsn belongs to a prior timeline then on executing
pg_sync_replication_slots() function, it creates a temporary slot and
waits for primary to catch up. And once primary catches up, the next
execution of SQL function persistes the slot and syncs it.

Setup: primary-->standby1-->standby2

Steps:
1) Insert data on primary. It gets replicated to both standbys.
2) Create logical slot on primary and execute
pg_sync_replication_slots() on standby1. The slot gets synced and
persisted on standby1.
3) Shutdown standby2.
4) Insert data on primary. It gets replicated to standby1.
5) Shutdown primary and promote standby1.
6) Insert some data on standby1/new primary directly.
7) Start standby2: It now needs to catch up old data of timeline1
(from step 4) + new data of timeline2 (from step 6) . It does that. On
reaching the end of the old timeline, walreceiver gets restarted and
starts streaming using the new timeline.
8) Execute pg_sync_replication_slots() on standby2 to sync the slot.
Now remote_slot's lsn belongs to a prior timeline on standby2. In my
test-run, remote_slot's lsn belonged to segno=4 on standby2, while the
oldest segno of current_timline(2) was 6. Thus it created the slot
locally with lsn belonging to the oldest segno 6 of cur_timeline(2)
but did not persist it as remote_slot's lsn was behind.
9) Now on standby1/new-primary, advance the logical slot by calling
pg_replication_slot_advance().
10) Execute pg_sync_replication_slots() again on standby2, now the
local temporary slot gets persisted as the restart_lsn of primary has
caught up.

thanks
Shveta




Re: About a recently-added message

2024-02-14 Thread shveta malik
On Thu, Feb 15, 2024 at 8:26 AM Amit Kapila  wrote:
>
> On Wed, Feb 14, 2024 at 7:51 PM Euler Taveira  wrote:
> >
> > On Wed, Feb 14, 2024, at 8:45 AM, Amit Kapila wrote:
> >
> > Now, I am less clear about whether to quote "logical" or not in the
> > above message. Do you have any suggestions?
> >
> >
> > The possible confusion comes from the fact that the sentence contains "must 
> > be"
> > in the middle of a comparison expression. For pg_createsubscriber, we are 
> > using
> >
> > publisher requires wal_level >= logical
> >
> > I suggest to use something like
> >
> > slot synchronization requires wal_level >= logical
> >
>
> This sounds like a better idea but shouldn't we directly use this in
> 'errmsg' instead of a separate 'errhint'? Also, if change this, then
> we should make a similar change for other messages in the same
> function.

+1 on changing the msg(s) suggested way. Please find the patch for the
same. It also removes double quotes around the variable names

thanks
Shveta


v1-0001-Fix-quotation-of-variable-names.patch
Description: Binary data


Re: Synchronizing slots from primary to standby

2024-02-18 Thread shveta malik
On Sun, Feb 18, 2024 at 7:40 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Friday, February 16, 2024 6:41 PM shveta malik  
> wrote:
> Thanks for the patch. Here are few comments:

Thanks for the comments.

>
> 2.
>
> +static bool
> +validate_remote_info(WalReceiverConn *wrconn, int elevel)
> ...
> +
> +   return (!remote_in_recovery && primary_slot_valid);
>
> The primary_slot_valid could be uninitialized in this function.

return (!remote_in_recovery && primary_slot_valid);

Here if remote_in_recovery is true, it will not even read
primary_slot_valid. It will read primary_slot_valid only if
remote_in_recovery is false and in such a case primary_slot_valid will
always be initialized in the else block above, let me know if you
still feel we shall initialize this to some default?

thanks
Shveta




Re: A new message seems missing a punctuation

2024-02-18 Thread shveta malik
On Mon, Feb 19, 2024 at 10:31 AM Robert Haas  wrote:
>
> On Mon, Feb 19, 2024 at 10:10 AM Kyotaro Horiguchi
>  wrote:
> > A recent commit (7a424ece48) added the following message:
> >
> > > could not sync slot information as remote slot precedes local slot:
> > > remote slot "%s": LSN (%X/%X), catalog xmin (%u) local slot: LSN
> > > (%X/%X), catalog xmin (%u)
> >
> > Since it is a bit overloaded but doesn't have a separator between
> > "catalog xmin (%u)" and "local slot:", it is somewhat cluttered. Don't
> > we need something, for example a semicolon there to improve
> > readability and reduce clutter?
>
> I think maybe we should even revise the message even more. In most
> places we do not just print out a whole bunch of values, but use a
> sentence to connect them.

I have tried to reword the msg, please have a look.

thanks
Shveta


v1-0001-Reword-LOG-msg-for-slot-sync.patch
Description: Binary data


Re: About a recently-added message

2024-02-18 Thread shveta malik
On Mon, Feb 19, 2024 at 11:10 AM Amit Kapila  wrote:
>
> On Thu, Feb 15, 2024 at 11:49 AM Kyotaro Horiguchi
>  wrote:
> >
> > At Thu, 15 Feb 2024 09:22:23 +0530, shveta malik  
> > wrote in
> > >
> > > +1 on changing the msg(s) suggested way. Please find the patch for the
> > > same. It also removes double quotes around the variable names
> >
> > Thanks for the discussion.
> >
> > With a translator hat on, I would be happy if I could determine
> > whether a word requires translation with minimal background
> > information. In this case, a translator needs to know which values
> > wal_level can take. It's relatively easy in this case, but I'm not
> > sure if this is always the case. Therefore, I would be slightly
> > happier if "logical" were double-quoted.
> >
>
> I see that we use "logical" in double quotes in various error
> messages. For example: "wal_level must be set to \"replica\" or
> \"logical\" at server start". So following that we can use the double
> quotes here as well.

Okay, now since we will have double quotes for logical. So do you
prefer the existing way of giving error msg or the changed one.

Existing:
errmsg("bad configuration for slot synchronization"),
errhint("wal_level must be >= logical."));

errmsg("bad configuration for slot synchronization"),
errhint("%s must be defined.", "primary_conninfo"));

The changed one:
errmsg("slot synchronization requires wal_level >= logical"));

errmsg("slot synchronization requires %s to be defined",
  "primary_conninfo"));

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-19 Thread shveta malik
On Mon, Feb 19, 2024 at 5:32 PM Amit Kapila  wrote:
>
> Few comments on 0001

Thanks for the feedback.

> 
> 1. I think it is better to error out when the valid GUC or option is
> not set in ensure_valid_slotsync_params() and
> ensure_valid_remote_info() instead of waiting. And we shouldn't start
> the worker in the first place if not all required GUCs are set. This
> will additionally simplify the code a bit.

Sure, removed 'ensure' functions. Moved the validation checks to the
postmaster before starting the slot sync worker.

> 2.
> +typedef struct SlotSyncWorkerCtxStruct
>  {
> - /* prevents concurrent slot syncs to avoid slot overwrites */
> + pid_t pid;
> + bool stopSignaled;
>   bool syncing;
> + time_t last_start_time;
>   slock_t mutex;
> -} SlotSyncCtxStruct;
> +} SlotSyncWorkerCtxStruct;
>
> I think we don't need to change the name of this struct as this can be
> used both by the worker and the backend. We can probably add the
> comment to indicate that all the fields except 'syncing' are used by
> slotsync worker.

Modified.

> 3. Similar to above, function names like SlotSyncShmemInit() shouldn't
> be changed to SlotSyncWorkerShmemInit().

Modified.

> 4.
> +ReplSlotSyncWorkerMain(int argc, char *argv[])
> {
> ...
> + on_shmem_exit(slotsync_worker_onexit, (Datum) 0);
> ...
> + before_shmem_exit(slotsync_failure_callback, PointerGetDatum(wrconn));
> ...
> }
>
> Do we need two separate callbacks? Can't we have just one (say
> slotsync_failure_callback) that cleans additional things in case of
> slotsync worker? And, if we need both those callbacks then please add
> some comments for both and why one is before_shmem_exit() and the
> other is on_shmem_exit().

I think we can merge these now. Earlier 'on_shmem_exit' was needed to
avoid race-condition between startup and slot sync worker process to
drop 'i' slots on promotion.  Now we do not have any such scenario.
But I need some time to analyze it well. Will do it in the next
version.

> In addition to the above, I have made a few changes in the comments
> and code (cosmetic code changes). Please include those in the next
> version if you find them okay. You need to rename .txt file to remove
> .txt and apply atop v90-0001*.

Sure, included these.

Please find the patch001 attached. I will rebase the rest of the
patches and post them tomorrow.

thanks
Shveta


v91-0001-Add-a-new-slotsync-worker.patch
Description: Binary data


Re: Synchronizing slots from primary to standby

2024-02-19 Thread shveta malik
On Tue, Feb 20, 2024 at 8:25 AM Masahiko Sawada  wrote:
>
>
> I've reviewed the v91 patch. Here are random comments:

Thanks for the comments.

> ---
>  /*
>   * Checks the remote server info.
>   *
> - * We ensure that the 'primary_slot_name' exists on the remote server and the
> - * remote server is not a standby node.
> + * Check whether we are a cascading standby. For non-cascading standbys, it
> + * also ensures that the 'primary_slot_name' exists on the remote server.
>   */
>
> IIUC what the validate_remote_info() does doesn't not change by this
> patch, so the previous comment seems to be clearer to me.
>
> ---
> if (remote_in_recovery)
> +   {
> +   /*
> +* If we are a cascading standby, no need to check further for
> +* 'primary_slot_name'.
> +*/
> ereport(ERROR,
> errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("cannot synchronize replication slots from a
> standby server"));
> +   }
> +   else
> +   {
> +   boolprimary_slot_valid;
>
> -   primary_slot_valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull));
> -   Assert(!isnull);
> +   primary_slot_valid = DatumGetBool(slot_getattr(tupslot, 2, &isnull));
> +   Assert(!isnull);
>
> -   if (!primary_slot_valid)
> -   ereport(ERROR,
> -   errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> -   errmsg("bad configuration for slot synchronization"),
> -   /* translator: second %s is a GUC variable name */
> -   errdetail("The replication slot \"%s\" specified by
> \"%s\" does not exist on the primary server.",
> - PrimarySlotName, "primary_slot_name"));
> +   if (!primary_slot_valid)
> +   ereport(ERROR,
> +   errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +   errmsg("bad configuration for slot synchronization"),
> +   /* translator: second %s is a GUC variable name */
> +   errdetail("The replication slot \"%s\" specified
> by \"%s\" does not exist on the primary server.",
> + PrimarySlotName, "primary_slot_name"));
> +   }
>
> I think it's a refactoring rather than changes required by the
> slotsync worker. We can do that in a separate patch but why do we need
> this change in the first place?

In v90, this refactoring was made due to the fact that
validate_remote_info() was supposed to behave differently for SQL
function and slot-sync worker. SQL-function was supposed to ERROR out
while the worker was supposed to LOG and become no-op. And thus the
change was needed. In v91, we made this functionality same i.e. both
sql function and worker will error out but missed to remove this
refactoring. Thanks for catching this, I will revert it in the next
version. To match the refactoring, I made the comment change too, will
revert that as well.

> ---
> +ValidateSlotSyncParams(ERROR);
> +
>  /* Load the libpq-specific functions */
>  load_file("libpqwalreceiver", false);
>
> -ValidateSlotSyncParams();
> +(void) CheckDbnameInConninfo();
>
> Is there any reason why we move where to check the parameters?

Earlier DBname verification was done inside ValidateSlotSyncParams()
and thus it was needed to load 'libpqwalreceiver' before we call this
function. Now we have moved dbname verification in a separate call and
thus the above order got changed. ValidateSlotSyncParams() is a common
function used by SQL function and worker. Earlier slot sync worker was
checking all the GUCs after starting up and was exiting each time any
GUC was invalid. It was suggested that it would be better to check the
GUCs before starting the slot sync worker in the postmaster itself,
making the ValidateSlotSyncParams() move to postmaster (see
SlotSyncWorkerAllowed).  But it was not a good idea to load libpq in
postmaster and thus we moved libpq related verification out of
ValidateSlotSyncParams(). This resulted in the above change.  I hope
it answers your query.

thanks
Shveta




Re: About a recently-added message

2024-02-20 Thread shveta malik
On Tue, Feb 20, 2024 at 2:13 PM Amit Kapila  wrote:
>
> I would prefer the changed ones as those clearly explain the problem
> without additional information.

okay, attached v2 patch with changed error msgs and double quotes
around logical.

thanks
Shveta


v2-0001-Fix-quotation-of-variable-names.patch
Description: Binary data


Re: A new message seems missing a punctuation

2024-02-20 Thread shveta malik
On Tue, Feb 20, 2024 at 5:01 PM Amit Kapila  wrote:
>
>
> We do expose the required information (restart_lsn, catalog_xmin,
> synced, temporary, etc) via pg_replication_slots. So, I feel the LOG
> message here is sufficient to DEBUG (or know the details) when the
> slot sync doesn't succeed.
>

Please find the patch having the suggested changes.

thanks
Shveta


v2-0001-Reword-LOG-msg-for-slot-sync.patch
Description: Binary data


'Shutdown <= SmartShutdown' check while launching processes in postmaster.

2024-02-20 Thread shveta malik
Hi hackers,

I would like to know that why we have 'Shutdown <= SmartShutdown'
check before launching few processes (WalReceiver, WalSummarizer,
AutoVacuum worker) while rest of the processes (BGWriter, WalWriter,
Checkpointer, Archiver etc) do not have any such check. If I have to
launch a new process, what shall be the criteria to decide if I need
this check?

Looking forward to your expert advice.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-20 Thread shveta malik
On Tue, Feb 20, 2024 at 6:19 PM Masahiko Sawada  wrote:
>
> On Tue, Feb 20, 2024 at 12:33 PM shveta malik  wrote:
> Thank you for the explanation. It makes sense to me to move the check.
>
>
> 2. If the wal_level is not logical, the server will need to restart
> anyway to change the wal_level and have the slotsync worker work. Does
> it make sense to have the postmaster exit if the wal_level is not
> logical and sync_replication_slots is enabled? For instance, we have
> similar checks in PostmsaterMain():
>
> if (summarize_wal && wal_level == WAL_LEVEL_MINIMAL)
> ereport(ERROR,
> (errmsg("WAL cannot be summarized when wal_level is
> \"minimal\"")));

Thanks for the feedback. I have addressed it in v93.

thanks
SHveta




Re: 'Shutdown <= SmartShutdown' check while launching processes in postmaster.

2024-02-21 Thread shveta malik
On Wed, Feb 21, 2024 at 10:01 AM Tom Lane  wrote:
>
> shveta malik  writes:
> > I would like to know that why we have 'Shutdown <= SmartShutdown'
> > check before launching few processes (WalReceiver, WalSummarizer,
> > AutoVacuum worker) while rest of the processes (BGWriter, WalWriter,
> > Checkpointer, Archiver etc) do not have any such check. If I have to
> > launch a new process, what shall be the criteria to decide if I need
> > this check?
>
> Children that are stopped by the "if (pmState == PM_STOP_BACKENDS)"
> stanza in PostmasterStateMachine should not be allowed to start
> again later if we are trying to shut down.  (But "smart" shutdown
> doesn't enforce that, since it's a very weak state that only
> prohibits new client sessions.)  The processes that are allowed
> to continue beyond that point are ones that are needed to perform
> the shutdown checkpoint, or useful to make it finish faster.

Thank you for providing the details. It clarifies the situation. Do
you think it would be beneficial to include this as a code comment in
postmaster.c to simplify understanding for future readers?

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-21 Thread shveta malik
On Wed, Feb 21, 2024 at 5:19 PM Amit Kapila  wrote:
>
> A few minor comments:

Thanks for the feedback.

> =
> 1.
> +/*
> + * Is stopSignaled set in SlotSyncCtx?
> + */
> +bool
> +IsStopSignaledSet(void)
> +{
> + bool signaled;
> +
> + SpinLockAcquire(&SlotSyncCtx->mutex);
> + signaled = SlotSyncCtx->stopSignaled;
> + SpinLockRelease(&SlotSyncCtx->mutex);
> +
> + return signaled;
> +}
> +
> +/*
> + * Reset stopSignaled in SlotSyncCtx.
> + */
> +void
> +ResetStopSignaled(void)
> +{
> + SpinLockAcquire(&SlotSyncCtx->mutex);
> + SlotSyncCtx->stopSignaled = false;
> + SpinLockRelease(&SlotSyncCtx->mutex);
> +}
>
> I think these newly introduced functions don't need spinlock to be
> acquired as these are just one-byte read-and-write. Additionally, when
> IsStopSignaledSet() is invoked, there shouldn't be any concurrent
> process to update that value. What do you think?

Yes, we can avoid taking spinlock here. These functions are invoked
after checking that pmState is PM_RUN. And in that state we do not
expect any other process writing this flag.

> 2.
> +REPL_SLOTSYNC_MAIN "Waiting in main loop of slot sync worker."
> +REPL_SLOTSYNC_SHUTDOWN "Waiting for slot sync worker to shut down."
>
> Let's use REPLICATION instead of REPL. I see other wait events using
> REPLICATION in their names.

Modified.

> 3.
> - * In standalone mode and in autovacuum worker processes, we use a fixed
> - * ID, otherwise we figure it out from the authenticated user name.
> + * In standalone mode, autovacuum worker processes and slot sync worker
> + * process, we use a fixed ID, otherwise we figure it out from the
> + * authenticated user name.
> */
> - if (bootstrap || IsAutoVacuumWorkerProcess())
> + if (bootstrap || IsAutoVacuumWorkerProcess() || IsLogicalSlotSyncWorker())
> {
> InitializeSessionUserIdStandalone();
> am_superuser = true;
>
> IIRC, we discussed this previously and it is safe to make the local
> connection as superuser as we don't consult any user tables, so we can
> probably add a comment where we invoke InitPostgres in slotsync.c

Added comment. Thanks Hou-San for the analysis here and providing comment.

> 4.
> $publisher->safe_psql('postgres',
> - "CREATE PUBLICATION regress_mypub FOR ALL TABLES;");
> + "CREATE PUBLICATION regress_mypub FOR ALL TABLES;"
> +);
>
> Why this change is required in the patch?

Not needed, removed it.

> 5.
> +# Confirm that restart_lsn and of confirmed_flush_lsn lsub1_slot slot
> are synced
> +# to the standby
>
> /and of/; looks like a typo

Modified.

> 6.
> +# Confirm that restart_lsn and of confirmed_flush_lsn lsub1_slot slot
> are synced
> +# to the standby
> +ok( $standby1->poll_query_until(
> + 'postgres',
> + "SELECT '$primary_restart_lsn' = restart_lsn AND
> '$primary_flush_lsn' = confirmed_flush_lsn from pg_replication_slots
> WHERE slot_name = 'lsub1_slot';"),
> + 'restart_lsn and confirmed_flush_lsn of slot lsub1_slot synced to standby');
> +
> ...
> ...
> +# Confirm the synced slot 'lsub1_slot' is retained on the new primary
> +is($standby1->safe_psql('postgres',
> + q{SELECT slot_name FROM pg_replication_slots WHERE slot_name =
> 'lsub1_slot';}),
> + 'lsub1_slot',
> + 'synced slot retained on the new primary');
>
> In both these checks, we should additionally check the 'synced' and
> 'temporary' flags to ensure that they are marked appropriately.

Modified.

Please find patch001 attached. There is a CFBot failure in patch002.
The test added there needs some adjustment. We will rebase and post
rest of the patches once we fix that issue.

thanks
Shveta


v94-0001-Add-a-new-slotsync-worker.patch
Description: Binary data


Re: Synchronizing slots from primary to standby

2024-02-21 Thread shveta malik
On Thu, Feb 22, 2024 at 10:31 AM shveta malik  wrote:
>
> Please find patch001 attached. There is a CFBot failure in patch002.
> The test added there needs some adjustment. We will rebase and post
> rest of the patches once we fix that issue.
>

There was a recent commit 801792e to improve error messaging in
slotsync.c which resulted in conflict. Thus rebased the patch. There
is no new change in the patch attached

thanks
Shveta


v94_2-0001-Add-a-new-slotsync-worker.patch
Description: Binary data


Re: Synchronizing slots from primary to standby

2024-02-22 Thread shveta malik
On Thu, Feb 22, 2024 at 3:44 PM Bertrand Drouvot
 wrote:
>
> Hi,
>
> Thanks!
>
> Some random comments about v92_001 (Sorry if it has already been discussed
> up-thread):

Thanks for the feedback. The patch is pushed 15 minutes back. I will
prepare a top-up patch for your comments.

> 1 ===
>
> +* We do not update the 'synced' column from true to false here
>
> Worth to mention from which system view the 'synced' column belongs to?

Sure, I will change it.

> 2 === (Nit)
>
> +#define MIN_WORKER_NAPTIME_MS  200
> +#define MAX_WORKER_NAPTIME_MS  3   /* 30s */
>
> [MIN|MAX]_SLOTSYNC_WORKER_NAPTIME_MS instead? It is used only in slotsync.c so
> more a Nit.

Okay, will change it,

> 3 ===
>
> res = walrcv_exec(wrconn, query, SLOTSYNC_COLUMN_COUNT, slotRow);
> -
> if (res->status != WALRCV_OK_TUPLES)
>
> Line removal intended?

I feel the current style is better, where we do not have space between
the function call and return value checking.

> 4 ===
>
> +   if (wal_level < WAL_LEVEL_LOGICAL)
> +   {
> +   ereport(ERROR,
> +   errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +   errmsg("slot synchronization requires 
> wal_level >= \"logical\""));
> +   return false;
> +   }
>
> I think the return is not needed here as it won't be reached due to the 
> "ERROR".
> Or should we use "elevel" instead of "ERROR"?

It was suggested to raise ERROR for wal_level validation, please see
[1]. But yes, I will  remove the return value. Thanks for catching
this.

> 5 ===
>
> +* operate as a superuser. This is safe because the slot sync worker 
> does
> +* not interact with user tables, eliminating the risk of executing
> +* arbitrary code within triggers.
>
> Right. I did not check but if we are using operators in our remote SPI calls
> then it would be worth to ensure they are coming from the pg_catalog schema?
> Using something like "OPERATOR(pg_catalog.=)" using "=" as an example.

Can you please elaborate this one, I am not sure if I understood it.

[1]: 
https://www.postgresql.org/message-id/CAD21AoB2ipSzQb5-o5pEYKie4oTPJTsYR1ip9_wRVrF6HbBWDQ%40mail.gmail.com

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-22 Thread shveta malik
On Thu, Feb 22, 2024 at 4:35 PM Bertrand Drouvot
 wrote:
>
> Suppose that in synchronize_slots() the query would be:
>
> const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
> " restart_lsn, catalog_xmin, two_phase, failover,"
> " database, conflict_reason"
> " FROM pg_catalog.pg_replication_slots"
> " WHERE failover and NOT temporary and 1 = 1";
>
> Then my comment is to rewrite it to:
>
> const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
> " restart_lsn, catalog_xmin, two_phase, failover,"
> " database, conflict_reason"
> " FROM pg_catalog.pg_replication_slots"
> " WHERE failover and NOT temporary and 1 OPERATOR(pg_catalog.=) 1";
>
> to ensure the operator "=" is coming from the pg_catalog schema.
>

Thanks for the details, but slot-sync does not use SPI calls, it uses
libpqrcv calls. So is this change needed?

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-22 Thread shveta malik
On Fri, Feb 23, 2024 at 8:35 AM shveta malik  wrote:
>
> On Thu, Feb 22, 2024 at 4:35 PM Bertrand Drouvot
>  wrote:
> >
> > Suppose that in synchronize_slots() the query would be:
> >
> > const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
> > " restart_lsn, catalog_xmin, two_phase, failover,"
> > " database, conflict_reason"
> > " FROM pg_catalog.pg_replication_slots"
> > " WHERE failover and NOT temporary and 1 = 1";
> >
> > Then my comment is to rewrite it to:
> >
> > const char *query = "SELECT slot_name, plugin, confirmed_flush_lsn,"
> > " restart_lsn, catalog_xmin, two_phase, failover,"
> > " database, conflict_reason"
> > " FROM pg_catalog.pg_replication_slots"
> > " WHERE failover and NOT temporary and 1 OPERATOR(pg_catalog.=) 1";
> >
> > to ensure the operator "=" is coming from the pg_catalog schema.
> >
>
> Thanks for the details, but slot-sync does not use SPI calls, it uses
> libpqrcv calls. So is this change needed?

Additionally, I would like to have a better understanding of why it's
necessary and whether it addresses any potential security risks.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-22 Thread shveta malik
On Fri, Feb 23, 2024 at 10:06 AM Zhijie Hou (Fujitsu)
 wrote:
>
>
> I noticed one CFbot failure[1] which is because the tap-test doesn't wait for 
> the
> standby to catch up before promoting, thus the data inserted after promotion
> could not be replicated to the subscriber. Add a wait_for_replay_catchup to 
> fix it.
>
> Apart from this, I also adjusted some variable names in the tap-test to be
> consistent. And added back a mis-removed ProcessConfigFile call.
>
> [1] https://cirrus-ci.com/task/6126787437002752?logs=check_world#L312
>

Thanks for the patches. Had a quick look at v95_2, here are some
trivial comments:


slot.h:
-
1)
extern List *GetStandbySlotList(bool copy);
extern void WaitForStandbyConfirmation(XLogRecPtr wait_for_lsn);
extern void FilterStandbySlots(XLogRecPtr wait_for_lsn,
     List **standby_slots);

The order is different from the one in slot.c

slot.c:
-
2)
warningfmt = _("replication slot \"%s\" specified in parameter \"%s\"
does not exist, ignoring");

GUC names should not have double quotes. Same in each warningfmt in
this function


3)
errmsg("replication slot \"%s\" specified in parameter \"%s\" does not
have active_pid",

Same here, double quotes around standby_slot_names should be removed


 walsender.c:
--
4)
* Used by logical decoding SQL functions that acquired slot with failover
* enabled.

To be consistent with other such comments in previous patches:
slot with failover enabled --> failover enabled slot

5) Wake up the logical walsender processes with failover-enabled slots

failover-enabled slots  --> failover enabled slots

postgresql.conf.sample:
--
6)
streaming replication standby server slot names that logical walsender
processes will wait for

Is it better to say it like this? (I leave this to your preference)

streaming replication standby server slot names for which logical
walsender processes will wait.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-23 Thread shveta malik
On Fri, Feb 23, 2024 at 1:28 PM Bertrand Drouvot
 wrote:
>
> Hi,
>
> Because one could create say the "=" OPERATOR in their own schema, attach a
> function to it doing undesired stuff and change the search_path for the 
> database
> the sync slot worker connects to.
>
> Then this new "=" operator would be used (instead of the pg_catalog.= one),
> triggering the "undesired" function as superuser.

Thanks for the details. I understand it now.  We do not use '=' in our
main slots-fetch query but we do use '=' in remote-validation query.
See validate_remote_info(). Do you think instead of doing the above,
we can override search-path with empty string in the slot-sync case.
SImilar to logical apply worker and autovacuum worker case (see
InitializeLogRepWorker(), AutoVacWorkerMain()).

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-25 Thread shveta malik
On Fri, Feb 23, 2024 at 7:41 PM Bertrand Drouvot
 wrote:
>
> Hi,
> > I think to set secure search path for remote connection, the standard 
> > approach
> > could be to extend the code in libpqrcv_connect[1], so that we don't need 
> > to schema
> > qualify all the operators in the queries.
> >
> > And for local connection, I agree it's also needed to add a
> > SetConfigOption("search_path", "" call in the slotsync worker.
> >
> > [1]
> > libpqrcv_connect
> > ...
> >   if (logical)
> > ...
> >   res = libpqrcv_PQexec(conn->streamConn,
> > 
> > ALWAYS_SECURE_SEARCH_PATH_SQL);
> >
>
> Agree, something like in the attached? (it's .txt to not disturb the CF bot).

Thanks for the patch, changes look good. I have corporated it in the
patch which addresses the rest of your comments in [1]. I have
attached the patch as .txt

[1]: 
https://www.postgresql.org/message-id/ZdcejBDCr%2BwlVGnO%40ip-10-97-1-34.eu-west-3.compute.internal

thanks
Shveta
From f1489f783748e85749a576cf21921e37137efd2a Mon Sep 17 00:00:00 2001
From: Shveta Malik 
Date: Thu, 22 Feb 2024 17:11:25 +0530
Subject: [PATCH v1] Top up Patch for commit 93db6cbda0

---
 src/backend/access/transam/xlogrecovery.c | 15 ---
 .../libpqwalreceiver/libpqwalreceiver.c   |  2 +-
 src/backend/replication/logical/slotsync.c| 19 +++
 3 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/transam/xlogrecovery.c 
b/src/backend/access/transam/xlogrecovery.c
index d73a49b3e8..9d907bf0e4 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1472,13 +1472,14 @@ FinishWalRecovery(void)
 * Shutdown the slot sync worker to drop any temporary slots acquired by
 * it and to prevent it from keep trying to fetch the failover slots.
 *
-* We do not update the 'synced' column from true to false here, as any
-* failed update could leave 'synced' column false for some slots. This
-* could cause issues during slot sync after restarting the server as a
-* standby. While updating the 'synced' column after switching to the 
new
-* timeline is an option, it does not simplify the handling for the
-* 'synced' column. Therefore, we retain the 'synced' column as true 
after
-* promotion as it may provide useful information about the slot origin.
+* We do not update the 'synced' column in 'pg_replication_slots' system
+* view from true to false here, as any failed update could leave 
'synced'
+* column false for some slots. This could cause issues during slot sync
+* after restarting the server as a standby. While updating the 'synced'
+* column after switching to the new timeline is an option, it does not
+* simplify the handling for the 'synced' column. Therefore, we retain 
the
+* 'synced' column as true after promotion as it may provide useful
+* information about the slot origin.
 */
ShutDownSlotSync();
 
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c 
b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 04271ee703..a30528a5f6 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -271,7 +271,7 @@ libpqrcv_connect(const char *conninfo, bool replication, 
bool logical,
 errhint("Target server's authentication method 
must be changed, or set password_required=false in the subscription 
parameters.")));
}
 
-   if (logical)
+   if (logical || !replication)
{
PGresult   *res;
 
diff --git a/src/backend/replication/logical/slotsync.c 
b/src/backend/replication/logical/slotsync.c
index 36773cfe73..5bb9e5d8b3 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -105,10 +105,10 @@ bool  sync_replication_slots = false;
  * (within a MIN/MAX range) according to slot activity. See
  * wait_for_slot_activity() for details.
  */
-#define MIN_WORKER_NAPTIME_MS  200
-#define MAX_WORKER_NAPTIME_MS  3   /* 30s */
+#define MIN_SLOTSYNC_WORKER_NAPTIME_MS  200
+#define MAX_SLOTSYNC_WORKER_NAPTIME_MS  3  /* 30s */
 
-static long sleep_ms = MIN_WORKER_NAPTIME_MS;
+static long sleep_ms = MIN_SLOTSYNC_WORKER_NAPTIME_MS;
 
 /* The restart interval for slot sync work used by postmaster */
 #define SLOTSYNC_RESTART_INTERVAL_SEC 10
@@ -924,12 +924,9 @@ ValidateSlotSyncParams(int elevel)
 * in this case regardl

Regardign RecentFlushPtr in WalSndWaitForWal()

2024-02-26 Thread shveta malik
Hi hackers,

I would like to understand why we have code [1] that retrieves
RecentFlushPtr in WalSndWaitForWal() outside of the loop. We utilize
RecentFlushPtr later within the loop, but prior to that, we already
have [2]. Wouldn't [2] alone be sufficient?

Just to check the impact, I ran 'make check-world' after removing [1],
and did not see any issue exposed by the test at-least.

Any thoughts?

[1]:
/* Get a more recent flush pointer. */
if (!RecoveryInProgress())
RecentFlushPtr = GetFlushRecPtr(NULL);
else
RecentFlushPtr = GetXLogReplayRecPtr(NULL);

[2]:
/* Update our idea of the currently flushed position. */
else if (!RecoveryInProgress())
RecentFlushPtr = GetFlushRecPtr(NULL);
else
RecentFlushPtr = GetXLogReplayRecPtr(NULL);

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-26 Thread shveta malik
On Mon, Feb 26, 2024 at 5:18 PM Amit Kapila  wrote:
>
> > > > +   if (!ok)
> > > > +   GUC_check_errdetail("List syntax is invalid.");
> > > > +
> > > > +   /*
> > > > +* If there is a syntax error in the name or if the replication 
> > > > slots'
> > > > +* data is not initialized yet (i.e., we are in the startup 
> > > > process), skip
> > > > +* the slot verification.
> > > > +*/
> > > > +   if (!ok || !ReplicationSlotCtl)
> > > > +   {
> > > > +   pfree(rawname);
> > > > +   list_free(elemlist);
> > > > +   return ok;
> > > > +   }
> > > >
> > > > we are testing the "ok" value twice, what about using if...else if... 
> > > > instead and
> > > > test it once? If so, it might be worth to put the:
> > > >
> > > > "
> > > > +   pfree(rawname);
> > > > +   list_free(elemlist);
> > > > +   return ok;
> > > > "
> > > >
> > > > in a "goto".
> > >
> > > There were comments to remove the 'goto' statement and avoid
> > > duplicate free code, so I prefer the current style.
> >
> > The duplicate free code would come from the if...else if... rewrite but then
> > the "goto" would remove it, so I'm not sure to understand your point.
> >
>
> I think Hou-San wants to say that there was previously a comment to
> remove goto and now you are saying to introduce it. But, I think we
> can avoid both code duplication and goto, if the first thing we check
> in the function is ReplicationSlotCtl and return false if the same is
> not set. Won't that be better?

I think we can not do that as we need to check atleast syntax before
we return due to NULL ReplicationSlotCtl. We get NULL
ReplicationSlotCtl during instance startup in
check_standby_slot_names() as postmaster first loads GUC-table and
then initializes shared-memory for replication slots. See calls of
InitializeGUCOptions() and CreateSharedMemoryAndSemaphores() in
PostmasterMain().  FWIW, I do not have any issue with current code as
well, but if we have to change it, is [1] any better?

[1]:
check_standby_slot_names()
{

if (!ok)
{
GUC_check_errdetail("List syntax is invalid.");
}
else if (ReplicationSlotCtl)
{
   foreach-loop for slot validation
}

pfree(rawname);
list_free(elemlist);
return ok;
}

thanks
SHveta




Re: Synchronizing slots from primary to standby

2024-02-27 Thread shveta malik
On Wed, Feb 28, 2024 at 8:49 AM Amit Kapila  wrote:
>
>
> Few comments:

Thanks for the feedback.

> ===
> 1.
> - if (logical)
> + if (logical || !replication)
>   {
>
> Can we add a comment about connection types that require
> ALWAYS_SECURE_SEARCH_PATH_SQL?
>
> 2.
> Can we add a test case to demonstrate that the '=' operator can be
> hijacked to do different things when the slotsync worker didn't use
> ALWAYS_SECURE_SEARCH_PATH_SQL?
>

Here is the patch with new test added and improved comments.

thanks
Shveta


v2-0001-Fixups-for-commit-93db6cbda0.patch
Description: Binary data


Re: Synchronizing slots from primary to standby

2024-02-28 Thread shveta malik
On Wed, Feb 28, 2024 at 1:33 PM Bertrand Drouvot
 wrote:
>
> Hi,
> A few comments:

Thanks for reviewing.

>
> 1 ===
>
> +* used to run normal SQL queries
>
> s/run normal SQL/run SQL/ ?
>
> As mentioned up-thread I don't like that much the idea of creating such a test
> but if we do then here are my comments:
>
> 2 ===
>
> +CREATE FUNCTION myschema.myintne(bigint, int)
>
> Should we explain why 'bigint, int' is important here (instead of
> 'int, int')?
>
> 3 ===
>
> +# stage of syncing newly created slots. If the worker was not prepared
> +# to handle such attacks, it would have failed during
>
> Worth to mention the underlying check / function that would get an 
> "unexpected"
> result?
>
> Except for the above (nit) comments the patch looks good to me.

Here is the patch which addresses the above comments. Also optimized
the test a little bit. Now we use pg_sync_replication_slots() function
instead of worker to test the operator-redirection using search-patch.
This has been done to simplify the test case and reduce the added
time.

thanks
Shveta


v3-0001-Fixups-for-commit-93db6cbda0.patch
Description: Binary data


Re: Synchronizing slots from primary to standby

2024-02-29 Thread shveta malik
On Thu, Feb 29, 2024 at 7:04 AM Zhijie Hou (Fujitsu)
 wrote:
>
> Here is the v101 patch set which addressed above comments.
>

Thanks for the patch. Few comments:

1)  Shall we mention in doc that shutdown will wait for standbys in
standby_slot_names to confirm receiving WAL:

Suggestion for logicaldecoding.sgml:

When standby_slot_names is utilized, the primary
server will not completely shut down until the corresponding standbys,
associated with the physical replication slots specified in
standby_slot_names, have confirmed receiving the
WAL up to the latest flushed position on the primary server.

slot.c
2)
/*
 * If a logical slot name is provided in standby_slot_names, report
 * a message and skip it. Although logical slots are disallowed in
 * the GUC check_hook(validate_standby_slots), it is still
 * possible for a user to drop an existing physical slot and
 * recreate a logical slot with the same name.
 */

This is not completely true, we can still specify a logical slot
during instance start and it will accept it.

Suggestion:
/*
 * If a logical slot name is provided in standby_slot_names, report
 * a message and skip it. It is possible for user to specify a
 * logical slot name in standby_slot_names just before the server
 * startup. The GUC check_hook(validate_standby_slots) can not
 * validate such a slot during startup as the ReplicationSlotCtl
 * shared memory is not initialized by that time. It is also
 * possible for user to drop an existing physical slot and
 * recreate a logical slot with the same name.
 */

3. Wait for physical standby to confirm receiving the given lsn

standby -->standbys


4.
In StandbyConfirmedFlush(), is it better to have below errdetail in
all problematic cases:
Logical replication is waiting on the standby associated with \"%s\

We have it only for inactive pid case but we are waiting in all cases.


thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-02-29 Thread shveta malik
On Wed, Feb 28, 2024 at 4:51 PM Amit Kapila  wrote:
>
> On Wed, Feb 28, 2024 at 3:26 PM shveta malik  wrote:
> >
> >
> > Here is the patch which addresses the above comments. Also optimized
> > the test a little bit. Now we use pg_sync_replication_slots() function
> > instead of worker to test the operator-redirection using search-patch.
> > This has been done to simplify the test case and reduce the added
> > time.
> >
>
> I have slightly adjusted the comments in the attached, otherwise, LGTM.

This patch was pushed (commit: b3f6b14) and it resulted in BF failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-02-29%2012%3A49%3A27

The concerned log on standby1:
2024-02-29 14:23:16.738 UTC [3908:4]
040_standby_failover_slots_sync.pl LOG:  statement: SELECT
pg_sync_replication_slots();
The system cannot find the file specified.
2024-02-29 14:23:16.971 UTC [3908:5]
040_standby_failover_slots_sync.pl ERROR:  could not connect to the
primary server: connection to server at "127.0.0.1", port 65352
failed: FATAL:  SSPI authentication failed for user "repl_role"
2024-02-29 14:23:16.971 UTC [3908:6]
040_standby_failover_slots_sync.pl STATEMENT:  SELECT
pg_sync_replication_slots();

It seems authentication is failing for the new role added.We also see
method=sspi used in the publisher log. We are analysing it further and
will share the findings.


thanks
Shveta




Add comment to specify timeout unit in ConditionVariableTimedSleep()

2024-03-04 Thread shveta malik
Hi hackers,

ConditionVariableTimedSleep() accepts a timeout parameter, but it
doesn't explicitly state the unit for the timeout anywhere. To
determine this, one needs to look into the details of the function to
find it out from the comments of the internally called function
WaitLatch(). It would be beneficial to include a comment in the header
of ConditionVariableTimedSleep() specifying that the timeout is in
milliseconds, similar to what we have for other non-static functions
like WaitLatch and WaitEventSetWait. Attached the patch for the same.

thanks
Shveta


v1-0001-Specify-timeout-unit-in-ConditionVariableTimedSle.patch
Description: Binary data


Re: Synchronizing slots from primary to standby

2024-03-04 Thread shveta malik
On Tue, Mar 5, 2024 at 9:15 AM Amit Kapila  wrote:
>
> On Tue, Mar 5, 2024 at 6:10 AM Peter Smith  wrote:
> >
> > ==
> > src/backend/replication/walsender.c
> >
> > 5. NeedToWaitForWal
> >
> > + /*
> > + * Check if the standby slots have caught up to the flushed position. It
> > + * is good to wait up to the flushed position and then let the WalSender
> > + * send the changes to logical subscribers one by one which are already
> > + * covered by the flushed position without needing to wait on every change
> > + * for standby confirmation.
> > + */
> > + if (NeedToWaitForStandbys(flushed_lsn, wait_event))
> > + return true;
> > +
> > + *wait_event = 0;
> > + return false;
> > +}
> > +
> >
> > 5a.
> > The comment (or part of it?) seems misplaced because it is talking
> > WalSender sending changes, but that is not happening in this function.
> >
>
> I don't think so. This is invoked only by walsender and a static
> function. I don't see any other better place to mention this.
>
> > Also, isn't what this is saying already described by the other comment
> > in the caller? e.g.:
> >
>
> Oh no, here we are explaining the wait order.

I think there is a scope of improvement here. The comment inside
NeedToWaitForWal() which states that we need to wait here for standbys
on flush-position(and not on each change) should be outside of this
function. It is too embedded. And the comment which states the order
of wait (first flush and then standbys confirmation) should be outside
the for-loop in WalSndWaitForWal(), but yes we do need both the
comments. Attached a patch (.txt) for comments improvement, please
merge if appropriate.

thanks
Shveta
From e03332839dd804f3bc38937e677ba87ac10f981b Mon Sep 17 00:00:00 2001
From: Shveta Malik 
Date: Tue, 5 Mar 2024 11:29:52 +0530
Subject: [PATCH v2] Comments improvement.

---
 src/backend/replication/walsender.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/src/backend/replication/walsender.c 
b/src/backend/replication/walsender.c
index 96f44ba85d..6a082b42eb 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1799,13 +1799,6 @@ NeedToWaitForWal(XLogRecPtr target_lsn, XLogRecPtr 
flushed_lsn,
return true;
}
 
-   /*
-* Check if the standby slots have caught up to the flushed position. It
-* is good to wait up to the flushed position and then let the WalSender
-* send the changes to logical subscribers one by one which are already
-* covered by the flushed position without needing to wait on every 
change
-* for standby confirmation.
-*/
if (NeedToWaitForStandbys(flushed_lsn, wait_event))
return true;
 
@@ -1818,7 +1811,10 @@ NeedToWaitForWal(XLogRecPtr target_lsn, XLogRecPtr 
flushed_lsn,
  *
  * If the walsender holds a logical slot that has enabled failover, we also
  * wait for all the specified streaming replication standby servers to
- * confirm receipt of WAL up to RecentFlushPtr.
+ * confirm receipt of WAL up to RecentFlushPtr. It's beneficial to wait
+ * here for the confirmation from standbys up to RecentFlushPtr rather
+ * than waiting in ReorderBufferCommit() before transmitting each
+ * change to logical subscribers, which is already covered in RecentFlushPtr.
  *
  * Returns end LSN of flushed WAL.  Normally this will be >= loc, but if we
  * detect a shutdown request (either from postmaster or client) we will return
@@ -1847,6 +1843,12 @@ WalSndWaitForWal(XLogRecPtr loc)
else
RecentFlushPtr = GetXLogReplayRecPtr(NULL);
 
+   /*
+* In the loop, we wait for the necessary WALs to be flushed to disk
+* initially, and then we wait for standbys to catch up. Upon receiving
+* the shutdown signal, we immediately transition to waiting for 
standbys
+* to catch up.
+*/
for (;;)
{
boolwait_for_standby_at_stop = false;
@@ -1877,12 +1879,10 @@ WalSndWaitForWal(XLogRecPtr loc)
XLogBackgroundFlush();
 
/*
-* Within the loop, we wait for the necessary WALs to be 
flushed to
-* disk first, followed by waiting for standbys to catch up if 
there
-* are enough WALs or upon receiving the shutdown signal. To 
avoid the
-* scenario where standbys need to catch up to a newer WAL 
location in
-* each iteration, we update our idea of the currently flushed
-* position only if we are not waiting for standbys to catch up.
+* Update our idea of the currently flushed position, but do it 
only
+* if we haven'

Re: Missing LWLock protection in pgstat_reset_replslot()

2024-03-05 Thread shveta malik
On Tue, Mar 5, 2024 at 1:25 PM Heikki Linnakangas  wrote:

> SearchNamedReplicationSlot() will also acquire the lock in LW_SHARED
> mode, when you pass need_lock=true. So that at least should be changed
> to false.
>

Also don't we need to release the lock when we return here:

/*
* Nothing to do for physical slots as we collect stats only for logical
* slots.
*/
if (SlotIsPhysical(slot))
return;

thanks
Shveta




Re: Missing LWLock protection in pgstat_reset_replslot()

2024-03-05 Thread shveta malik
On Tue, Mar 5, 2024 at 6:52 PM Bertrand Drouvot
 wrote:
>
> > /*
> > * Nothing to do for physical slots as we collect stats only for logical
> > * slots.
> > */
> > if (SlotIsPhysical(slot))
> > return;
>
> D'oh! Thanks! Fixed in v2 shared up-thread.

Thanks.  Can we try to get rid of multiple LwLockRelease in
pgstat_reset_replslot(). Is this any better?

/*
-* Nothing to do for physical slots as we collect stats only for logical
-* slots.
+* Reset stats if it is a logical slot. Nothing to do for physical slots
+* as we collect stats only for logical slots.
 */
-   if (SlotIsPhysical(slot))
-   {
-   LWLockRelease(ReplicationSlotControlLock);
-   return;
-   }
-
-   /* reset this one entry */
-   pgstat_reset(PGSTAT_KIND_REPLSLOT, InvalidOid,
-ReplicationSlotIndex(slot));
+   if (SlotIsLogical(slot))
+   pgstat_reset(PGSTAT_KIND_REPLSLOT, InvalidOid,
+ReplicationSlotIndex(slot));

LWLockRelease(ReplicationSlotControlLock);


Something similar in pgstat_fetch_replslot() perhaps?

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-03-06 Thread shveta malik
On Wed, Mar 6, 2024 at 6:54 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Wednesday, March 6, 2024 9:13 PM Zhijie Hou (Fujitsu) 
>  wrote:
> >
> > On Wednesday, March 6, 2024 11:04 AM Zhijie Hou (Fujitsu)
> >  wrote:
> > >
> > > On Wednesday, March 6, 2024 9:30 AM Masahiko Sawada
> > >  wrote:
> > >
> > > Hi,
> > >
> > > > On Fri, Mar 1, 2024 at 4:21 PM Zhijie Hou (Fujitsu)
> > > > 
> > > > wrote:
> > > > >
> > > > > On Friday, March 1, 2024 2:11 PM Masahiko Sawada
> > > >  wrote:
> > > > > >
> > > > > >
> > > > > > ---
> > > > > > +void
> > > > > > +assign_standby_slot_names(const char *newval, void *extra) {
> > > > > > +List  *standby_slots;
> > > > > > +MemoryContext oldcxt;
> > > > > > +char  *standby_slot_names_cpy = extra;
> > > > > > +
> > > > > >
> > > > > > Given that the newval and extra have the same data
> > > > > > (standby_slot_names value), why do we not use newval instead? I
> > > > > > think that if we use newval, we don't need to guc_strdup() in
> > > > > > check_standby_slot_names(), we might need to do list_copy_deep()
> > > > > > instead, though. It's not clear to me as there is no comment.
> > > > >
> > > > > I think SplitIdentifierString will modify the passed in string, so
> > > > > we'd better not pass the newval to it, otherwise the stored guc
> > > > > string(standby_slot_names) will be changed. I can see we are doing
> > > > > similar thing in other GUC check/assign function as well.
> > > > > (check_wal_consistency_checking/ assign_wal_consistency_checking,
> > > > > check_createrole_self_grant/ assign_createrole_self_grant ...).
> > > >
> > > > Why does it have to be a List in the first place?
> > >
> > > I thought the List type is convenient to use here, as we have existing
> > > list build function(SplitIdentifierString), and have convenient list
> > > macro to loop the
> > > list(foreach_ptr) which can save some codes.
> > >
> > > > In earlier version patches, we
> > > > used to copy the list and delete the element until it became empty,
> > > > while waiting for physical wal senders. But we now just refer to
> > > > each slot name in the list. The current code assumes that
> > > > stnadby_slot_names_cpy is allocated in GUCMemoryContext but once it
> > > > changes, it will silently get broken. I think we can check and
> > > > assign standby_slot_names in a similar way to
> > > > check/assign_temp_tablespaces and
> > check/assign_synchronous_standby_names.
> > >
> > > Yes, we could do follow it by allocating an array and copy each slot
> > > name into it, but it also requires some codes to build and scan the
> > > array. So, is it possible to expose the GucMemorycontext or have an API 
> > > like
> > guc_copy_list instead ?
> > > If we don't want to touch the guc api, I am ok with using an array as 
> > > well.
> >
> > I rethink about this and realize that it's not good to do the memory 
> > allocation in
> > assign hook function. As the "src/backend/utils/misc/README" said, we'd
> > better do that in check hook function and pass it via extra to assign hook
> > function. And thus array is a good choice in this case rather than a List 
> > which
> > cannot be passed to *extra.
> >
> > Here is the V107 patch set which parse and cache the standby slot names in 
> > an
> > array instead of a List.
>
> The patch needs to be rebased due to recent commit.
>
> Attach the V107_2 path set. There are no code changes in this version.

 The patch needed to be rebased due to a recent commit. Attached
v107_3, there are no code changes in this version.

thanks
Shveta


v107_3-0002-Document-the-steps-to-check-if-the-standby-is.patch
Description: Binary data


v107_3-0001-Allow-logical-walsenders-to-wait-for-the-phys.patch
Description: Binary data


Re: Missing LWLock protection in pgstat_reset_replslot()

2024-03-06 Thread shveta malik
On Wed, Mar 6, 2024 at 2:36 PM Bertrand Drouvot
 wrote:
>
> Hi,
>
> On Wed, Mar 06, 2024 at 10:24:46AM +0530, shveta malik wrote:
> > On Tue, Mar 5, 2024 at 6:52 PM Bertrand Drouvot
> >  wrote:
> > Thanks.  Can we try to get rid of multiple LwLockRelease in
> > pgstat_reset_replslot(). Is this any better?
> >
> > /*
> > -* Nothing to do for physical slots as we collect stats only for 
> > logical
> > -* slots.
> > +* Reset stats if it is a logical slot. Nothing to do for physical 
> > slots
> > +* as we collect stats only for logical slots.
> >  */
> > -   if (SlotIsPhysical(slot))
> > -   {
> > -   LWLockRelease(ReplicationSlotControlLock);
> > -   return;
> > -   }
> > -
> > -   /* reset this one entry */
> > -   pgstat_reset(PGSTAT_KIND_REPLSLOT, InvalidOid,
> > -ReplicationSlotIndex(slot));
> > +   if (SlotIsLogical(slot))
> > +   pgstat_reset(PGSTAT_KIND_REPLSLOT, InvalidOid,
> > +ReplicationSlotIndex(slot));
> >
> > LWLockRelease(ReplicationSlotControlLock);
> >
>
> Yeah, it's easier to read and probably reduce the pgstat_replslot.o object 
> file
> size a bit for non optimized build.
>
> > Something similar in pgstat_fetch_replslot() perhaps?
>
> Yeah, all of the above done in v3 attached.
>

Thanks for the patch.

For the fix in pgstat_fetch_replslot(), even with the lock in fetch
call, there are chances that the concerned slot can be dropped and
recreated.

--It can happen in a small window in pg_stat_get_replication_slot()
when we are consuming the return values of pgstat_fetch_replslot
(using slotent).

--Also it can happen at a later stage when we have switched to
fetching the next slot (obtained from 'pg_replication_slots' through
view' pg_stat_replication_slots'), the previous one can be dropped.

Ultimately the results of system view 'pg_replication_slots' can still
give us non-existing or re-created slots. But yes, I do not deny that
it gives us better consistency protection.

Do you feel that the lock in pgstat_fetch_replslot() should be moved
to its caller when we are done copying the results of that slot? This
will improve the protection slightly.

thanks
Shveta




Re: Missing LWLock protection in pgstat_reset_replslot()

2024-03-06 Thread shveta malik
On Thu, Mar 7, 2024 at 11:12 AM Michael Paquier  wrote:
>

> Yeah, it is possible that what you retrieve from
> pgstat_fetch_replslot() does not refer exactly to the slot's content
> under concurrent activity, but you cannot protect a single scan of
> pg_stat_replication_slots as of an effect of its design:
> pg_stat_get_replication_slot() has to be called multiple times.  The
> patch at least makes sure that the copy of the slot's stats retrieved
> by pgstat_fetch_entry() is slightly more consistent

Right, I understand that.

, but you cannot do
> better than that except if the data retrieved from
> pg_replication_slots and its stats are fetched in the same context
> function call, holding the replslot LWLock for the whole scan
> duration.

Yes, agreed.

>
> > Do you feel that the lock in pgstat_fetch_replslot() should be moved
> > to its caller when we are done copying the results of that slot? This
> > will improve the protection slightly.
>
> I don't see what extra protection this would offer, as
> pg_stat_get_replication_slot() is called once for each slot.  Feel
> free to correct me if I'm missing something.

It slightly improves the chances.  pgstat_fetch_replslot is only
called from pg_stat_get_replication_slot(), I thought it might be
better to acquire lock before we call pgstat_fetch_replslot and
release once we are done copying the results for that particular slot.
But  I also understand that it will not prevent someone from dropping
that slot at a later stage when the rest of the calls of
pg_stat_get_replication_slot() are still pending. So I am okay with
the current one.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2024-03-07 Thread shveta malik
On Fri, Mar 8, 2024 at 9:56 AM Ajin Cherian  wrote:
>
>> Pushed with minor modifications. I'll keep an eye on BF.
>>
>> BTW, one thing that we should try to evaluate a bit more is the
>> traversal of slots in StandbySlotsHaveCaughtup() where we verify if
>> all the slots mentioned in standby_slot_names have received the
>> required WAL. Even if the standby_slot_names list is short the total
>> number of slots can be much larger which can lead to an increase in
>> CPU usage during traversal. There is an optimization that allows to
>> cache ss_oldest_flush_lsn and ensures that we don't need to traverse
>> the slots each time so it may not hit frequently but still there is a
>> chance. I see it is possible to further optimize this area by caching
>> the position of each slot mentioned in standby_slot_names in
>> replication_slots array but not sure whether it is worth.
>>
>>
>
> I tried to test this by configuring a large number of logical slots while 
> making sure the standby slots are at the end of the array and checking if 
> there was any performance hit in logical replication from these searches.
>

Thanks  Ajin and Nisha.

We also plan:
1) Redoing XLogSendLogical time-log related test with
'sync_replication_slots' enabled.
2) pg_recvlogical test to monitor lag in StandbySlotsHaveCaughtup()
for a large number of slots.
3) Profiling to see if StandbySlotsHaveCaughtup() is noticeable in the
report when there are a large number of slots to traverse.

thanks
Shveta




Re: Regardign RecentFlushPtr in WalSndWaitForWal()

2024-03-11 Thread shveta malik
On Sat, Mar 2, 2024 at 4:44 PM Amit Kapila  wrote:
>
> Right, I think the quoted code has check "if (!RecoveryInProgress())".
>
> >
>  But apart from that, your
> > observation seems accurate, yes.
> >
>
> I also find the observation correct and the code has been like that
> since commit 5a991ef8 [1]. So, let's wait to see if Robert or Andres
> remembers the reason, otherwise, we should probably nuke this code.

Please find the patch attached for the same.

thanks
Shveta


v1-0001-Remove-redundant-RecentFlushPtr-fetch-in-WalSndWa.patch
Description: Binary data


Re: Add comment to specify timeout unit in ConditionVariableTimedSleep()

2024-03-11 Thread shveta malik
On Sat, Mar 9, 2024 at 12:19 PM Michael Paquier  wrote:
>
> On Tue, Mar 05, 2024 at 03:20:48PM +0900, Michael Paquier wrote:
> > That sounds like a good idea to me, so I'm OK with your suggestion.
>
> Applied this one as f160bf06f72a.  Thanks.

Thanks!

thanks
Shveta




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-12 Thread shveta malik
On Wed, Mar 6, 2024 at 2:47 PM Bharath Rupireddy
 wrote:
>
> On Wed, Mar 6, 2024 at 2:42 PM Bertrand Drouvot
>  wrote:
> >
> > Hi,
> >
> > On Tue, Mar 05, 2024 at 01:44:43PM -0600, Nathan Bossart wrote:
> > > On Wed, Mar 06, 2024 at 12:50:38AM +0530, Bharath Rupireddy wrote:
> > > > On Mon, Mar 4, 2024 at 2:11 PM Bertrand Drouvot
> > > >  wrote:
> > > >> On Sun, Mar 03, 2024 at 03:44:34PM -0600, Nathan Bossart wrote:
> > > >> > Unless I am misinterpreting some details, ISTM we could rename this 
> > > >> > column
> > > >> > to invalidation_reason and use it for both logical and physical 
> > > >> > slots.  I'm
> > > >> > not seeing a strong need for another column.
> > > >>
> > > >> Yeah having two columns was more for convenience purpose. Without the 
> > > >> "conflict"
> > > >> one, a slot conflicting with recovery would be "a logical slot having 
> > > >> a non NULL
> > > >> invalidation_reason".
> > > >>
> > > >> I'm also fine with one column if most of you prefer that way.
> > > >
> > > > While we debate on the above, please find the attached v7 patch set
> > > > after rebasing.
> > >
> > > It looks like Bertrand is okay with reusing the same column for both
> > > logical and physical slots
> >
> > Yeah, I'm okay with one column.
>
> Thanks. v8-0001 is how it looks. Please see the v8 patch set with this change.

JFYI, the patch does not apply to the head. There is a conflict in
multiple files.

thanks
Shveta




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-13 Thread shveta malik
> JFYI, the patch does not apply to the head. There is a conflict in
> multiple files.

For review purposes, I applied v8 to the March 6 code-base. I have yet
to review in detail, please find my initial thoughts:

1)
I found that 'inactive_replication_slot_timeout' works only if there
was any walsender ever started for that slot . The logic is under
'am_walsender' check.  Is this intentional?
If I create a slot and use only pg_logical_slot_get_changes or
pg_replication_slot_advance on it, it never gets invalidated due to
timeout.  While, when I set 'max_slot_xid_age' or say
'max_slot_wal_keep_size' to a lower value, the said slot is
invalidated correctly with 'xid_aged' and 'wal_removed' reasons
respectively.

Example:
With inactive_replication_slot_timeout=1min, test1_3  is the slot for
which there is no walsender and only advance and get_changes SQL
functions were called; test1_4 is the one for which pg_recvlogical was
run for a second.

 test1_3 |   785 | | reserved   |   | t
|  |
 test1_4 |   798 | | lost   | inactive_timeout| t|
2024-03-13 11:52:41.58446+05:30  |

And when inactive_replication_slot_timeout=0  and max_slot_xid_age=10

 test1_3 |   785 | | lost   |  xid_aged   | t
  |  |
 test1_4 |   798 | | lost   | inactive_timeout| t|
2024-03-13 11:52:41.58446+05:30  |


2)
The msg for patch 3 says:
--
a) when replication slots is lying inactive for a day or so using
last_inactive_at metric,
b) when a replication slot is becoming inactive too frequently using
last_inactive_at metric.
--
 I think in b, you want to refer to inactive_count instead of last_inactive_at?

3)
I do not see invalidation_reason updated for 2 new reasons in system-views.sgml


thanks
Shveta




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-13 Thread shveta malik
On Fri, Mar 8, 2024 at 10:42 PM Bharath Rupireddy
 wrote:
>
> On Wed, Mar 6, 2024 at 4:49 PM Amit Kapila  wrote:
> >
> > You might want to consider its interaction with sync slots on standby.
> > Say, there is no activity on slots in terms of processing the changes
> > for slots. Now, we won't perform sync of such slots on standby showing
> > them inactive as per your new criteria where as same slots could still
> > be valid on primary as the walsender is still active. This may be more
> > of a theoretical point as in running system there will probably be
> > some activity but I think this needs some thougths.
>
> I believe the xmin and catalog_xmin of the sync slots on the standby
> keep advancing depending on the slots on the primary, no? If yes, the
> XID age based invalidation shouldn't be a problem.

If the user has not enabled slot-sync worker and is relying on the SQL
function pg_sync_replication_slots(), then the xmin and catalog_xmin
of synced slots may not keep on advancing. These will be advanced only
on next run of function. But meanwhile the synced slots may be
invalidated due to 'xid_aged'.  Then the next time, when user runs
pg_sync_replication_slots() again, the invalidated slots will be
dropped and will be recreated by this SQL function (provided they are
valid on primary and are invalidated on standby alone).  I am not
stating that it is a problem, but we need to think if this is what we
want. Secondly, the behaviour is not same with 'inactive_timeout'
invalidation. Synced slots are immune to 'inactive_timeout'
invalidation as this invalidation happens only in walsender, while
these are not immune to 'xid_aged' invalidation. So again, needs some
thoughts here.

> I believe there are no walsenders started for the sync slots on the
> standbys, right? If yes, the inactive timeout based invalidation also
> shouldn't be a problem. Because, the inactive timeouts for a slot are
> tracked only for walsenders because they are the ones that typically
> hold replication slots for longer durations and for real replication
> use. We did a similar thing in a recent commit [1].
>
> Is my understanding right? Do you still see any problems with it?

I have explained the situation above for us to think over it better.

thanks
Shveta




Re: Have pg_basebackup write "dbname" in "primary_conninfo"?

2024-03-14 Thread shveta malik
On Thu, Mar 14, 2024 at 10:57 AM Amit Kapila  wrote:
>
> On Thu, Mar 14, 2024 at 5:57 AM Masahiko Sawada  wrote:
> >
> > This fact makes me think that the slotsync worker might be able to
> > accept the primary_conninfo value even if there is no dbname in the
> > value. That is, if there is no dbname in the primary_conninfo, it uses
> > the username in accordance with the specs of the connection string.
> > Currently, the slotsync worker connects to the local database first
> > and then establishes the connection to the primary server. But if we
> > can reverse the two steps, it can get the dbname that has actually
> > been used to establish the remote connection and use it for the local
> > connection too. That way, the primary_conninfo generated by
> > pg_basebackup could work even without the patch. For example, if the
> > OS user executing pg_basebackup is 'postgres', the slotsync worker
> > would connect to the postgres database. Given the 'postgres' database
> > is created by default and 'postgres' OS user is used in common, I
> > guess it could cover many cases in practice actually.
> >
>
> I think this is worth investigating but I suspect that in most cases
> users will end up using a replication connection without specifying
> the user name and we may not be able to give a meaningful error
> message when slotsync worker won't be able to connect. The same will
> be true even when the dbname same as the username would be used.
>

I attempted the change as suggested by Swada-San. Attached the PoC
patch .For it to work, I have to expose a new get api in libpq-fe
which gets dbname from stream-connection. Please have a look.

Without this PoC patch, the errors in slot-sync worker:

-
a) If dbname is missing:
[1230932] LOG:  slot sync worker started
[1230932] ERROR:  slot synchronization requires dbname to be specified
in primary_conninfo

b) If specified db does not exist
[1230913] LOG:  slot sync worker started
[1230913] FATAL:  database "postgres1" does not exist
-

Now with this patch:
-
a) If the dbname same as user does not exist:
[1232473] LOG:  slot sync worker started
[1232473] ERROR:  could not connect to the primary server: connection
to server at "127.0.0.1", port 5433 failed: FATAL: database
"bckp_user" does not exist

b) If user itself is removed from primary_conninfo, libpq takes user
who has authenticated the system by default and gives error if db of
same name does not exist
ERROR:  could not connect to the primary server: connection to server
at "127.0.0.1", port 5433 failed: FATAL:  database "shveta" does not
exist
-

The errors in second case look slightly confusing to me.

thanks
Shveta


v1-0001-Use-user-as-dbname-for-slot-sync.patch
Description: Binary data


Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-14 Thread shveta malik
On Thu, Mar 14, 2024 at 7:58 PM Bharath Rupireddy
 wrote:
>
> On Thu, Mar 14, 2024 at 12:24 PM Amit Kapila  wrote:
> >
> > On Wed, Mar 13, 2024 at 9:24 PM Bharath Rupireddy
> > >
> > > Yes, there will be some sort of duplicity if we emit conflict_reason
> > > as a text field. However, I still think the better way is to turn
> > > conflict_reason text to conflict boolean and set it to true only on
> > > rows_removed and wal_level_insufficient invalidations. When conflict
> > > boolean is true, one (including all the tests that we've added
> > > recently) can look for invalidation_reason text field for the reason.
> > > This sounds reasonable to me as opposed to we just mentioning in the
> > > docs that "if invalidation_reason is rows_removed or
> > > wal_level_insufficient it's the reason for conflict with recovery".

+1 on maintaining both conflicting and invalidation_reason

> > Fair point. I think we can go either way. Bertrand, Nathan, and
> > others, do you have an opinion on this matter?
>
> While we wait to hear from others on this, I'm attaching the v9 patch
> set implementing the above idea (check 0001 patch). Please have a
> look. I'll come back to the other review comments soon.

Thanks for the patch. JFYI, patch09 does not apply to HEAD, some
recent commit caused the conflict.

Some trivial comments on patch001 (yet to review other patches)

1)
info.c:

- "%s as caught_up, conflict_reason IS NOT NULL as invalid "
+ "%s as caught_up, invalidation_reason IS NOT NULL as invalid "

Can we revert back to 'conflicting as invalid' since it is a query for
logical slots only.

2)
040_standby_failover_slots_sync.pl:

- q{SELECT conflict_reason IS NULL AND synced AND NOT temporary FROM
pg_replication_slots WHERE slot_name = 'lsub1_slot';}
+ q{SELECT invalidation_reason IS NULL AND synced AND NOT temporary
FROM pg_replication_slots WHERE slot_name = 'lsub1_slot';}

Here too, can we have 'NOT conflicting' instead of '
invalidation_reason IS NULL' as it is a logical slot test.

thanks
Shveta




Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-15 Thread shveta malik
On Thu, Mar 14, 2024 at 7:58 PM Bharath Rupireddy
 wrote:
>
> While we wait to hear from others on this, I'm attaching the v9 patch
> set implementing the above idea (check 0001 patch). Please have a
> look. I'll come back to the other review comments soon.
>

patch002:

1)
I would like to understand the purpose of 'inactive_count'? Is it only
for users for monitoring purposes? We are not using it anywhere
internally.

I shutdown the instance 5 times and found that 'inactive_count' became
5 for all the slots created on that instance. Is this intentional? I
mean we can not really use them if the instance is down.  I felt it
should increment the inactive_count only if during the span of
instance, they were actually inactive i.e. no streaming or replication
happening through them.


2)
slot.c:
+ case RS_INVAL_XID_AGE:
+ {
+ if (TransactionIdIsNormal(s->data.xmin))
+ {
+  ..
+ }
+ if (TransactionIdIsNormal(s->data.catalog_xmin))
+ {
+  ..
+ }
+ }

Can we optimize this code? It has duplicate code for processing
s->data.catalog_xmin and s->data.xmin. Can we create a sub-function
for this purpose and call it twice here?

thanks
Shveta




Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-03 Thread shveta malik
On Tue, Jan 3, 2023 at 11:10 AM wangw.f...@fujitsu.com
 wrote:
>
> On Mon, Jan 2, 2023 at 18:54 PM Amit Kapila  wrote:
> > On Fri, Dec 30, 2022 at 3:55 PM wangw.f...@fujitsu.com
> >  wrote:
> > >
> > > I've checked it and it looks good to me.
> > > Rebased the other patches and ran the pgident for the patch set.
> > >
> > > Attach the new patch set.
> > >
> >
> > I have added a few DEBUG messages and changed a few comments in the
> > 0001 patch. With that v71-0001* looks good to me and I'll commit it
> > later this week (by Thursday or Friday) unless there are any major
> > comments or objections.
>
> Thanks for your improvement.
>
> Rebased the patch set because the new change in HEAD (c8e1ba7).
> Attach the new patch set.
>
> Regards,
> Wang wei

Hi,
In continuation with [1] and [2], I did some performance testing on
v70-0001 patch.

This test used synchronous logical replication and compared SQL
execution times before and after applying the patch.

The following cases are tested by varying logical_decoding_work_mem:
a) Bulk insert.
b) Bulk delete
c) Bulk update
b) Rollback to savepoint. (different percentages of changes in the
transaction are rolled back).

The tests are performed ten times, and the average of the middle eight is taken.

The scripts are the same as before [1]. The scripts for additional
update and delete testing are attached.

The results are as follows:

RESULT - bulk insert (5kk)
---
logical_decoding_work_mem 64kB256kB   64MB
HEAD 34.475  34.222  34.400
patched  20.168  20.181  20.510
Compare with HEAD-41.49% -41.029%-40.377%


RESULT - bulk delete (5kk)
---
logical_decoding_work_mem 64kB256kB   64MB
HEAD 40.286  41.312  41.312
patched  23.749  23.759  23.480
Compare with HEAD -41.04% -42.48%-43.16%


RESULT - bulk update (5kk)
---
logical_decoding_work_mem 64kB256kB   64MB
HEAD 63.650  65.260  65.459
patched  46.692  46.275  48.281
Compare with HEAD-26.64% -29.09%-26.24%


RESULT - rollback 10% (5kk)
---
logical_decoding_work_mem 64kB256kB   64MB
HEAD33.386  33.213  31.990
patched  20.540  19.295  18.139
Compare with HEAD -38.47% -41.90%-43.29%


RESULT - rollback 20% (5kk)
---
logical_decoding_work_mem 64kB256kB   64MB
HEAD 32.150  31.871  30.825
patched  19.331  19.366  18.285
Compare with HEAD-39.87% -39.23% -40.68%


RESULT - rollback 30% (5kk)
---
logical_decoding_work_mem   64kB256kB   64MB
HEAD  28.611  30.139  29.433
patched   19.632  19.838  18.374
Compare with HEAD   -31.38% -34.17%  -37.57%


RESULT - rollback 50% (5kk)
---
logical_decoding_work_mem   64kB256kB   64MB
HEAD   27.410  27.167 25.990
patched19.982  18.749 18.048
Compare with HEAD   -27.099%-30.98%  -30.55%

(if "Compare with HEAD" is a positive number, it means worse than
HEAD; if it is a negative number, it means better than HEAD.)

Summary:
Update shows 26-29% improvement, while insert and delete shows ~40% improvement.
In the case of rollback, the improvement is somewhat between 27-42%.
The improvement slightly decreases with larger amounts of data being
rolled back.


[1] 
https://www.postgresql.org/message-id/OSZPR01MB63103AA97349BBB858E27DEAFD499%40OSZPR01MB6310.jpnprd01.prod.outlook.com
[2] 
https://www.postgresql.org/message-id/OSZPR01MB6310174063C9144D2081F657FDE09%40OSZPR01MB6310.jpnprd01.prod.outlook.com

thanks
Shveta
<>


Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-03 Thread shveta malik
>
> On Tue, 27 Dec 2022 at 14:59, Hayato Kuroda (Fujitsu)
>  wrote:
> > Note that more than half of the modifications are done by Osumi-san.
>

Please find a few minor comments.
1.
+  diffms = TimestampDifferenceMilliseconds(GetCurrentTimestamp(),
+

   TimestampTzPlusMilliseconds(ts, MySubscription->minapplydelay));
 on unix, above code looks unaligned (copied from unix)

2. same with:
+  interval = DatumGetIntervalP(DirectFunctionCall3(interval_in,
+

   CStringGetDatum(val),
+

   ObjectIdGetDatum(InvalidOid),
+

   Int32GetDatum(-1)));
perhaps due to tabs?

2. comment not clear:
+  * During the time delayed replication, avoid reporting the suspended
+  * latest LSN are already flushed and written, to the publisher.

3.
+  * Call send_feedback() to prevent the publisher from exiting by
+  * timeout during the delay, when wal_receiver_status_interval is
+  * available. The WALs for this delayed transaction is neither
+  * written nor flushed yet, Thus, we don't make the latest LSN
+  * overwrite those positions of the update message for this delay.

 yet, Thus, we -->  yet, thus, we/   yet. Thus, we


4.
+  /* Adds portion time (in ms) to the previous result. */
+  ms = interval->time / INT64CONST(1000);
Is interval->time always in micro-seconds here?



Thanks
Shveta




Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-11 Thread shveta malik
On Tue, Jan 10, 2023 at 7:42 PM Takamichi Osumi (Fujitsu)
 wrote:
>
> On Tuesday, January 3, 2023 4:01 PM vignesh C  wrote:
> Hi, thanks for your review !
>
>
> > 1) This global variable can be removed as it is used only in send_feedback 
> > which
> > is called from maybe_delay_apply so we could pass it as a function argument:
> > + * delay, avoid having positions of the flushed and apply LSN
> > +overwritten by
> > + * the latest LSN.
> > + */
> > +static bool in_delaying_apply = false;
> > +static XLogRecPtr last_received = InvalidXLogRecPtr;
> > +
> I have removed the first variable and make it one of the arguments for 
> send_feedback().
>
> > 2) -1 gets converted to -1000
> >
> > +int64
> > +interval2ms(const Interval *interval)
> > +{
> > +   int64   days;
> > +   int64   ms;
> > +   int64   result;
> > +
> > +   days = interval->month * INT64CONST(30);
> > +   days += interval->day;
> > +
> > +   /* Detect whether the value of interval can cause an overflow. */
> > +   if (pg_mul_s64_overflow(days, MSECS_PER_DAY, &result))
> > +   ereport(ERROR,
> > +
> > (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
> > +errmsg("bigint out of range")));
> > +
> > +   /* Adds portion time (in ms) to the previous result. */
> > +   ms = interval->time / INT64CONST(1000);
> > +   if (pg_add_s64_overflow(result, ms, &result))
> > +   ereport(ERROR,
> > +
> > (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
> > +errmsg("bigint out of range")));
> >
> > create subscription sub7 connection 'dbname=regression host=localhost
> > port=5432' publication pub1 with (min_apply_delay = '-1');
> > ERROR:  -1000 ms is outside the valid range for parameter "min_apply_delay"
> Good catch! Fixed in order to make input '-1' interpretted as -1 ms.
>
> > 3) This can be slightly reworded:
> > +  
> > +   The length of time (ms) to delay the application of changes.
> > +  
> > to:
> > Delay applying the changes by a specified amount of time(ms).
> This has been suggested in [1] by Peter Smith. So, I'd like to keep the 
> current patch's description.
> Then, I didn't change this.
>
> > 4)  maybe_delay_apply can be moved from apply_handle_stream_prepare to
> > apply_spooled_messages so that it is consistent with
> > maybe_start_skipping_changes:
> > @@ -1120,6 +1240,19 @@ apply_handle_stream_prepare(StringInfo s)
> >
> > elog(DEBUG1, "received prepare for streamed transaction %u",
> > prepare_data.xid);
> >
> > +   /*
> > +* Should we delay the current prepared transaction?
> > +*
> > +* Although the delay is applied in BEGIN PREPARE messages,
> > streamed
> > +* prepared transactions apply the delay in a STREAM PREPARE
> > message.
> > +* That's ok because no changes have been applied yet
> > +* (apply_spooled_messages() will do it). The STREAM START message
> > does
> > +* not contain a prepare time (it will be available when the 
> > in-progress
> > +* prepared transaction finishes), hence, it was not possible to 
> > apply a
> > +* delay at that time.
> > +*/
> > +   maybe_delay_apply(prepare_data.prepare_time);
> >
> >
> > That way the call from apply_handle_stream_commit can also be removed.
> Sounds good. I moved the call of maybe_delay_apply() to the 
> apply_spooled_messages().
> Now it's aligned with maybe_start_skipping_changes().
>
> > 5) typo transfering should be transferring
> > +  publisher and the current time on the subscriber. Time
> > spent in logical
> > +  decoding and in transfering the transaction may reduce the
> > actual wait
> > +  time. If the system clocks on publisher and subscriber are
> > + not
> Fixed.
>
> > 6) feedbacks can be changed to feedback messages
> > + * it's necessary to keep sending feedbacks during the delay from the
> > + worker
> > + * process. Meanwhile, the feature delays the apply before starting the
> Fixed.
>
> > 7)
> > + /*
> > + * Suppress overwrites of flushed and writtten positions by the lastest
> > + * LSN in send_feedback().
> > + */
> >
> > 7a) typo writtten should be written
> >
> > 7b) lastest should latest
> I have removed this sentence. So, those typos are removed.
>
> Please have a look at the updated patch.
>
> [1] - 
> https://www.postgresql.org/message-id/CAHut%2BPttQdFMNM2c6WqKt2c9G6r3ZKYRGHm04RR-4p4fyA4WRg%40mail.gmail.com
>
>

Hi,

1.
+ errmsg("min_apply_delay must not be set when streaming = parallel")));
we give the same error msg for  both the cases:
a. when subscription is created with streaming=parallel  but we are
trying to alter subscription to set min_apply_delay >0
b. when subscription is created with some min_apply_delay and we are
trying to alter subscription to make it streaming=parallel.
For case a, error msg looks fine but for case b, I think error msg
should be changed slightly.

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-11 Thread shveta malik
On Wed, Jan 11, 2023 at 3:27 PM shveta malik  wrote:
>
> On Tue, Jan 10, 2023 at 7:42 PM Takamichi Osumi (Fujitsu)
>  wrote:
> >
> > On Tuesday, January 3, 2023 4:01 PM vignesh C  wrote:
> > Hi, thanks for your review !
> >
> >
> > > 1) This global variable can be removed as it is used only in 
> > > send_feedback which
> > > is called from maybe_delay_apply so we could pass it as a function 
> > > argument:
> > > + * delay, avoid having positions of the flushed and apply LSN
> > > +overwritten by
> > > + * the latest LSN.
> > > + */
> > > +static bool in_delaying_apply = false;
> > > +static XLogRecPtr last_received = InvalidXLogRecPtr;
> > > +
> > I have removed the first variable and make it one of the arguments for 
> > send_feedback().
> >
> > > 2) -1 gets converted to -1000
> > >
> > > +int64
> > > +interval2ms(const Interval *interval)
> > > +{
> > > +   int64   days;
> > > +   int64   ms;
> > > +   int64   result;
> > > +
> > > +   days = interval->month * INT64CONST(30);
> > > +   days += interval->day;
> > > +
> > > +   /* Detect whether the value of interval can cause an overflow. */
> > > +   if (pg_mul_s64_overflow(days, MSECS_PER_DAY, &result))
> > > +   ereport(ERROR,
> > > +
> > > (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
> > > +errmsg("bigint out of range")));
> > > +
> > > +   /* Adds portion time (in ms) to the previous result. */
> > > +   ms = interval->time / INT64CONST(1000);
> > > +   if (pg_add_s64_overflow(result, ms, &result))
> > > +   ereport(ERROR,
> > > +
> > > (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
> > > +errmsg("bigint out of range")));
> > >
> > > create subscription sub7 connection 'dbname=regression host=localhost
> > > port=5432' publication pub1 with (min_apply_delay = '-1');
> > > ERROR:  -1000 ms is outside the valid range for parameter 
> > > "min_apply_delay"
> > Good catch! Fixed in order to make input '-1' interpretted as -1 ms.
> >
> > > 3) This can be slightly reworded:
> > > +  
> > > +   The length of time (ms) to delay the application of changes.
> > > +  
> > > to:
> > > Delay applying the changes by a specified amount of time(ms).
> > This has been suggested in [1] by Peter Smith. So, I'd like to keep the 
> > current patch's description.
> > Then, I didn't change this.
> >
> > > 4)  maybe_delay_apply can be moved from apply_handle_stream_prepare to
> > > apply_spooled_messages so that it is consistent with
> > > maybe_start_skipping_changes:
> > > @@ -1120,6 +1240,19 @@ apply_handle_stream_prepare(StringInfo s)
> > >
> > > elog(DEBUG1, "received prepare for streamed transaction %u",
> > > prepare_data.xid);
> > >
> > > +   /*
> > > +* Should we delay the current prepared transaction?
> > > +*
> > > +* Although the delay is applied in BEGIN PREPARE messages,
> > > streamed
> > > +* prepared transactions apply the delay in a STREAM PREPARE
> > > message.
> > > +* That's ok because no changes have been applied yet
> > > +* (apply_spooled_messages() will do it). The STREAM START message
> > > does
> > > +* not contain a prepare time (it will be available when the 
> > > in-progress
> > > +* prepared transaction finishes), hence, it was not possible to 
> > > apply a
> > > +* delay at that time.
> > > +*/
> > > +   maybe_delay_apply(prepare_data.prepare_time);
> > >
> > >
> > > That way the call from apply_handle_stream_commit can also be removed.
> > Sounds good. I moved the call of maybe_delay_apply() to the 
> > apply_spooled_messages().
> > Now it's aligned with maybe_start_skipping_changes().
> >
> > > 5) typo transfering should be transferring
> > > +  publisher and the current time on the subscriber. Time
> > > spent in logical
> > > +  decoding and in transfering the transaction may reduce the
> > > actual wait
> > >

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-11 Thread shveta malik
On Wed, Jan 11, 2023 at 6:16 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shveta,
>
> Thanks for reviewing! PSA new version.
>
> > 1.
> > + errmsg("min_apply_delay must not be set when streaming = parallel")));
> > we give the same error msg for  both the cases:
> > a. when subscription is created with streaming=parallel  but we are
> > trying to alter subscription to set min_apply_delay >0
> > b. when subscription is created with some min_apply_delay and we are
> > trying to alter subscription to make it streaming=parallel.
> > For case a, error msg looks fine but for case b, I think error msg
> > should be changed slightly.
> > ALTER SUBSCRIPTION regress_testsub SET (streaming = parallel);
> > ERROR:  min_apply_delay must not be set when streaming = parallel
> > This gives the feeling that we are trying to modify min_apply_delay
> > but we are not. Maybe we can change it to:
> > "subscription with min_apply_delay must not be allowed to stream
> > parallel" (or something better)
>
> Your point that error messages are strange is right. And while
> checking other ones, I found they have very similar styles. Therefore I 
> reworded
> ERROR messages in AlterSubscription() and parse_subscription_options() to 
> follow
> them. Which version is better?
>

v14 one looks much better. Thanks!

thanks
Shveta




Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-12 Thread shveta malik
On Wed, Jan 11, 2023 at 6:16 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> > 2.
> > I think users can set ' wal_receiver_status_interval ' to 0 or more
> > than 'wal_sender_timeout'. But is this a frequent use-case scenario or
> > do we see DBAs setting these in such a way by mistake? If so, then I
> > think, it is better to give Warning message in such a case when a user
> > tries to create or alter a subscription with a large 'min_apply_delay'
> > (>=  'wal_sender_timeout') , rather than leaving it to the user's
> > understanding that WalSender may repeatedly timeout in such a case.
> > Parse_subscription_options and AlterSubscription can be modified to
> > log a warning. Any thoughts?
>
> Yes, DBAs may set wal_receiver_status_interval to more than 
> wal_sender_timeout by
> mistake.
>
> But to handle the scenario we must compare between min_apply_delay *on 
> subscriber*
> and wal_sender_timeout *on publisher*. Both values are not transferred to 
> opposite
> sides, so the WARNING cannot be raised. I considered that such a mechanism 
> seemed
> to be complex. The discussion around [1] may be useful.
>
> [1]: 
> https://www.postgresql.org/message-id/CAA4eK1Lq%2Bh8qo%2BrqGU-E%2BhwJKAHYocV54y4pvou4rLysCgYD-g%40mail.gmail.com
>

okay, I see. So even when 'wal_receiver_status_interval' is set to 0,
no log/warning is needed when the user tries to set min_apply_delay>0?
Are we good with doc alone?

One trivial correction in config.sgml:
+   terminates due to the timeout errors. Hence, make sure this parameter
+   shorter than the wal_sender_timeout of the publisher.
Hence, make sure this parameter is shorter...  

thanks
Shveta




Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-12 Thread shveta malik
On Thu, Jan 12, 2023 at 10:34 AM Amit Kapila  wrote:
>
> On Thu, Jan 12, 2023 at 9:54 AM Peter Smith  wrote:
> >
> >
> > doc/src/sgml/monitoring.sgml
> >
> > 5. pg_stat_subscription
> >
> > @@ -3198,11 +3198,22 @@ SELECT pid, wait_event_type, wait_event FROM
> > pg_stat_activity WHERE wait_event i
> >
> >   
> >
> > +   apply_leader_pid integer
> > +  
> > +  
> > +   Process ID of the leader apply worker, if this process is a apply
> > +   parallel worker. NULL if this process is a leader apply worker or a
> > +   synchronization worker.
> > +  
> > + 
> > +
> > + 
> > +  
> > relid oid
> >
> >
> > OID of the relation that the worker is synchronizing; null for the
> > -   main apply worker
> > +   main apply worker and the parallel apply worker
> >
> >   
> >
> > 5a.
> >
> > (Same as general comment #1 about terminology)
> >
> > "apply_leader_pid" --> "leader_apply_pid"
> >
>
> How about naming this as just leader_pid? I think it could be helpful
> in the future if we decide to parallelize initial sync (aka parallel
> copy) because then we could use this for the leader PID of parallel
> sync workers as well.
>
> --

I still prefer leader_apply_pid.
leader_pid does not tell which 'operation' it belongs to. 'apply'
gives the clarity that it is apply related process.

The terms used in patch look very confusing. I had to read a few lines
multiple times to understand it.

1.
Summary says 'main_worker_pid' to be added but I do not see
'main_worker_pid' added in pg_stat_subscription, instead I see
'apply_leader_pid'. Am I missing something? Also, as stated above
'leader_apply_pid' makes more sense.
it is better to correct it everywhere (apply leader-->leader apply).
Once that is done, it can be reviewed again.

thanks
Shveta




Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-12 Thread shveta malik
On Thu, Jan 12, 2023 at 4:37 PM Amit Kapila  wrote:
>
>
> But then do you suggest that tomorrow if we allow parallel sync
> workers then we have a separate column leader_sync_pid? I think that
> doesn't sound like a good idea and moreover one can refer to docs for
> clarification.
>
> --
okay, leader_pid is fine I think.

thanks
Shveta




Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-16 Thread shveta malik
On Tue, Jan 17, 2023 at 9:07 AM houzj.f...@fujitsu.com
 wrote:
>
> On Tuesday, January 17, 2023 11:32 AM Peter Smith  
> wrote:
> >
> > On Tue, Jan 17, 2023 at 1:21 PM houzj.f...@fujitsu.com
> >  wrote:
> > >
> > > On Tuesday, January 17, 2023 5:43 AM Peter Smith
> >  wrote:
> > > >
> > > > On Mon, Jan 16, 2023 at 5:41 PM Amit Kapila
> > > > 
> > > > wrote:
> > > > >
> > > > > On Mon, Jan 16, 2023 at 10:24 AM Peter Smith
> > > > > 
> > > > wrote:
> > > > > >
> > > > > > 2.
> > > > > >
> > > > > >  /*
> > > > > > + * Return the pid of the leader apply worker if the given pid
> > > > > > +is the pid of a
> > > > > > + * parallel apply worker, otherwise return InvalidPid.
> > > > > > + */
> > > > > > +pid_t
> > > > > > +GetLeaderApplyWorkerPid(pid_t pid) {  int leader_pid =
> > > > > > +InvalidPid;  int i;
> > > > > > +
> > > > > > + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
> > > > > > +
> > > > > > + for (i = 0; i < max_logical_replication_workers; i++) {
> > > > > > + LogicalRepWorker *w = &LogicalRepCtx->workers[i];
> > > > > > +
> > > > > > + if (isParallelApplyWorker(w) && w->proc && pid ==
> > > > > > + w->proc->pid) { leader_pid = w->leader_pid; break; } }
> > > > > > +
> > > > > > + LWLockRelease(LogicalRepWorkerLock);
> > > > > > +
> > > > > > + return leader_pid;
> > > > > > +}
> > > > > >
> > > > > > 2a.
> > > > > > IIUC the IsParallelApplyWorker macro does nothing except check
> > > > > > that the leader_pid is not InvalidPid anyway, so AFAIK this
> > > > > > algorithm does not benefit from using this macro because we will
> > > > > > want to return InvalidPid anyway if the given pid matches.
> > > > > >
> > > > > > So the inner condition can just say:
> > > > > >
> > > > > > if (w->proc && w->proc->pid == pid) { leader_pid =
> > > > > > w->leader_pid; break; }
> > > > > >
> > > > >
> > > > > Yeah, this should also work but I feel the current one is explicit
> > > > > and more clear.
> > > >
> > > > OK.
> > > >
> > > > But, I have one last comment about this function -- I saw there are
> > > > already other functions that iterate max_logical_replication_workers
> > > > like this looking for things:
> > > > - logicalrep_worker_find
> > > > - logicalrep_workers_find
> > > > - logicalrep_worker_launch
> > > > - logicalrep_sync_worker_count
> > > >
> > > > So I felt this new function (currently called
> > > > GetLeaderApplyWorkerPid) ought to be named similarly to those ones.
> > > > e.g. call it something like "logicalrep_worker_find_pa_leader_pid".
> > > >
> > >
> > > I am not sure we can use the name, because currently all the API name
> > > in launcher that used by other module(not related to subscription) are
> > > like AxxBxx style(see the functions in logicallauncher.h).
> > > logicalrep_worker_xxx style functions are currently only declared in
> > > worker_internal.h.
> > >
> >
> > OK. I didn't know there was another header convention that you were 
> > following.
> > In that case, it is fine to leave the name as-is.
>
> Thanks for confirming!
>
> Attach the new version 0001 patch which addressed all other comments.
>
> Best regards,
> Hou zj

Hello Hou-san,

1. Do we need to extend test-cases to review the leader_pid column in
pg_stats tables?
2. Do we need to follow the naming convention for
'GetLeaderApplyWorkerPid' like other functions in the same file which
starts with 'logicalrep_'

thanks
Shveta




Re: Question about initial logical decoding snapshot

2023-01-18 Thread shveta malik
Hello,

I was curious as to why we need 3rd running_xact and wanted to learn
more about it, so I have made a few changes to come up with a patch
which builds the snapshot in 2 running_xacts. The motive is to run the
tests to see the failures/issues with this approach to understand the
need of reading 3rd running_xact to build a consistent snapshot. On
this patch, I have got one test-failure which is
test_decoding/twophase_snapshot.

Approach:
When we start building a snapshot, on the occurrence of first
running_xact, move the state from START to BUILDING and wait for all
in-progress transactions to finish. On the second running_xact where
we find oldestRunningXid >= 1st xl_running_xact's nextXid, move to
CONSISTENT state. So, it means all the transactions started before
BUILDING state are now finished and all the new transactions that are
currently in progress are the ones that are started after BUILDING
state and thus have enough info to be decoded.

Failure analysis for twophase_snapshot test:
After the patch application, test-case fails because slot is created
sooner and 'PREPARE TRANSACTION test1' is available as result of first
'pg_logical_slot_get_changes'  itself.  Intent of this testcase is to
see how two-phase txn is handled when snapshot-build completes in 3
stages (BUILDING-->FULL-->CONSISTENT). Originally, the PREPARED txn is
started between FULL and CONSISTENT stage and thus as per the current
code logic, 'DecodePrepare' will skip it. Please see code in
DecodePrepare:

 /* We can't start streaming unless a consistent state is reached. */
if (SnapBuildCurrentState(builder) < SNAPBUILD_CONSISTENT)
{
ReorderBufferSkipPrepare(ctx->reorder, xid);
return;
}

So first 'pg_logical_slot_get_changes' will not show these changes.
Once we do 'commit prepared' after CONSISTENT state is reached, it
will be available for next 'pg_logical_slot_get_changes' to consume.

On the other hand, after the current patch, since we reach consistent
state sooner, so with the same test-case, PREPARED transaction now
ends up starting after CONSISTENT state and thus will be available to
be consumed by first 'pg_logical_slot_get_changes' itself. This makes
the testcase to fail.

Please note that in the patch, I have maintained 'WAIT for all running
transactions to end' even after reaching CONSISTENT state. I have
tried running tests even after removing that WAIT after CONSISTENT,
with that, we get one more test failure which is
test_decoding/ondisk_startup. The reason for failure here is the same
as previous case i.e., since we reach CONSISTENT state earlier,
slot-creation finishes faster and thus we see slight change in result
for this test.  ('step s1init completed' seen earlier in log file).

Both the failing tests here are written in such a way that they align
with the 3-phase snapshot build process. Otherwise, I do not see any
logical issues yet with this approach based on the test-cases
available so far.

So, I still have not gotten clarity on why we need 3rd running_xact
here. In code, I see a comment in SnapBuildFindSnapshot() which says
"c) ...But for older running transactions no viable snapshot exists
yet, so CONSISTENT will only be reached once all of those have
finished."  This comment refers to txns started between BUILDING and
FULL state. I do not understand it fully. I am not sure what tests I
need to run on the patch to reproduce this issue where we do not have
a viable snapshot when we go by two running_xacts only.

Any thoughts/comments are most welcome. Attached the patch for review.

Thanks
Shveta

On Fri, Dec 30, 2022 at 11:57 PM Chong Wang  wrote:
>
> Hi hackers.
>
> I'm studying the source code about creation of initial logical decoding 
> snapshot. What confused me is that why must we process 3 xl_running_xacts 
> before we get to the consistent state. I think we only need 2 
> xl_running_xacts.
>
> I think we can get to consistent state when we meet the 2nd xl_running_xact 
> with its oldestRunningXid > 1st xl_running_xact's nextXid, this means the 
> active transactions in 1st xl_running_xact all had commited, and we have all 
> the logs of transactions who will commit afterwards, so there is consistent 
> state in this time point and we can export a snapshot.
>
> I had read the discussion in [0] and the comment of commit '955a684', but I 
> haven't got a detailed explanation about why we need 4 stages during creation 
> of initial logical decoding snapshot but not 3 stages.
>
> My rencent job is relevant to logical decoding so I want to figure this 
> problem out, I'm very grateful if you can answer me, thanks.
>
>
>
> [0] 
> https://www.postgresql.org/message-id/flat/f37e975c-908f-858e-707f-058d3b1eb214%402ndquadrant.com
>
>
>
> --
>
> Best regards
>
> Chong Wang
>
> Greenplum DataFlow team


v1-0001-SnapBuild-in-2-stages.patch
Description: Binary data


Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-19 Thread shveta malik
On Thu, Jan 19, 2023 at 11:11 AM Amit Kapila  wrote:
>
> On Wed, Jan 18, 2023 at 12:09 PM Amit Kapila  wrote:
> >
> > On Fri, Jan 13, 2023 at 11:50 AM Peter Smith  wrote:
> > >
> > > Here are some review comments for patch v79-0002.
> > >
> >
> > So, this is about the latest v84-0001-Stop-extra-worker-if-GUC-was-changed.
> >
> > >
> > > I feel this patch just adds more complexity for almost no gain:
> > > - reducing the 'max_apply_workers_per_suibscription' seems not very
> > > common in the first place.
> > > - even when the GUC is reduced, at that point in time all the workers
> > > might be in use so there may be nothing that can be immediately done.
> > > - IIUC the excess workers (for a reduced GUC) are going to get freed
> > > naturally anyway over time as more transactions are completed so the
> > > pool size will reduce accordingly.
> > >
> >
> > I am still not sure if it is worth pursuing this patch because of the
> > above reasons. I don't think it would be difficult to add this even at
> > a later point in time if we really see a use case for this.
> > Sawada-San, IIRC, you raised this point. What do you think?
> >
> > The other point I am wondering is whether we can have a different way
> > to test partial serialization apart from introducing another developer
> > GUC (stream_serialize_threshold). One possibility could be that we can
> > have a subscription option (parallel_send_timeout or something like
> > that) with some default value (current_timeout used in the patch)
> > which will be used only when streaming = parallel. Users may want to
> > wait for more time before serialization starts depending on the
> > workload (say when resource usage is high on a subscriber-side
> > machine, or there are concurrent long-running transactions that can
> > block parallel apply for a bit longer time). I know with this as well
> > it may not be straightforward to test the functionality because we
> > can't be sure how many changes would be required for a timeout to
> > occur. This is just for brainstorming other options to test the
> > partial serialization functionality.
> >
>
> Apart from the above, we can also have a subscription option to
> specify parallel_shm_queue_size (queue_size used to determine the
> queue between the leader and parallel worker) and that can be used for
> this purpose. Basically, configuring it to a smaller value can help in
> reducing the test time but still, it will not eliminate the need for
> dependency on timing we have to wait before switching to partial
> serialize mode. I think this can be used in production as well to tune
> the performance depending on workload.
>
> Yet another way is to use the existing parameter logical_decode_mode
> [1]. If the value of logical_decoding_mode is 'immediate', then we can
> immediately switch to partial serialize mode. This will eliminate the
> dependency on timing. The one argument against using this is that it
> won't be as clear as a separate parameter like
> 'stream_serialize_threshold' proposed by the patch but OTOH we already
> have a few parameters that serve a different purpose when used on the
> subscriber. For example, 'max_replication_slots' is used to define the
> maximum number of replication slots on the publisher and the maximum
> number of origins on subscribers. Similarly,
> wal_retrieve_retry_interval' is used for different purposes on
> subscriber and standby nodes.
>
> [1] - https://www.postgresql.org/docs/devel/runtime-config-developer.html
>
> --
> With Regards,
> Amit Kapila.

Hi Amit,

On rethinking the complete model, what I feel is that the name
logical_decoding_mode is not something which defines modes of logical
decoding. We, I think, picked it based on logical_decoding_work_mem.
As per current implementation, the parameter 'logical_decoding_mode'
tells what happens when work-memory used by logical decoding reaches
its limit. So it is in-fact 'logicalrep_workmem_vacate_mode' or
'logicalrep_trans_eviction_mode'. So if it is set to immediate,
meaning vacate the work-memory immediately or evict transactions
immediately. Add buffered means the reverse (i.e. keep on buffering
transactions until we reach a limit). Now coming to subscribers, we
can reuse the same parameter. On subscriber as well, shared-memory
queue could be considered as its workmem and thus the name
'logicalrep_workmem_vacate_mode' might look more relevant.

On publisher:
logicalrep_workmem_vacate_mode=immediate, buffered.

On subscriber:
logicalrep_workmem_vacate_mode=stream_serialize  (or if we want to
keep immediate here too, that will also be fine).

Thoughts?
And I am assuming it is possible to change the GUC name before the
coming release. If not, please let me know, we can brainstorm other
ideas.

thanks
Shveta




Re: Perform streaming logical transactions by background workers and parallel apply

2023-01-19 Thread shveta malik
On Thu, Jan 19, 2023 at 3:44 PM shveta malik  wrote:
>
> On Thu, Jan 19, 2023 at 11:11 AM Amit Kapila  wrote:
> >
> > On Wed, Jan 18, 2023 at 12:09 PM Amit Kapila  
> > wrote:
> > >
> > > On Fri, Jan 13, 2023 at 11:50 AM Peter Smith  
> > > wrote:
> > > >
> > > > Here are some review comments for patch v79-0002.
> > > >
> > >
> > > So, this is about the latest 
> > > v84-0001-Stop-extra-worker-if-GUC-was-changed.
> > >
> > > >
> > > > I feel this patch just adds more complexity for almost no gain:
> > > > - reducing the 'max_apply_workers_per_suibscription' seems not very
> > > > common in the first place.
> > > > - even when the GUC is reduced, at that point in time all the workers
> > > > might be in use so there may be nothing that can be immediately done.
> > > > - IIUC the excess workers (for a reduced GUC) are going to get freed
> > > > naturally anyway over time as more transactions are completed so the
> > > > pool size will reduce accordingly.
> > > >
> > >
> > > I am still not sure if it is worth pursuing this patch because of the
> > > above reasons. I don't think it would be difficult to add this even at
> > > a later point in time if we really see a use case for this.
> > > Sawada-San, IIRC, you raised this point. What do you think?
> > >
> > > The other point I am wondering is whether we can have a different way
> > > to test partial serialization apart from introducing another developer
> > > GUC (stream_serialize_threshold). One possibility could be that we can
> > > have a subscription option (parallel_send_timeout or something like
> > > that) with some default value (current_timeout used in the patch)
> > > which will be used only when streaming = parallel. Users may want to
> > > wait for more time before serialization starts depending on the
> > > workload (say when resource usage is high on a subscriber-side
> > > machine, or there are concurrent long-running transactions that can
> > > block parallel apply for a bit longer time). I know with this as well
> > > it may not be straightforward to test the functionality because we
> > > can't be sure how many changes would be required for a timeout to
> > > occur. This is just for brainstorming other options to test the
> > > partial serialization functionality.
> > >
> >
> > Apart from the above, we can also have a subscription option to
> > specify parallel_shm_queue_size (queue_size used to determine the
> > queue between the leader and parallel worker) and that can be used for
> > this purpose. Basically, configuring it to a smaller value can help in
> > reducing the test time but still, it will not eliminate the need for
> > dependency on timing we have to wait before switching to partial
> > serialize mode. I think this can be used in production as well to tune
> > the performance depending on workload.
> >
> > Yet another way is to use the existing parameter logical_decode_mode
> > [1]. If the value of logical_decoding_mode is 'immediate', then we can
> > immediately switch to partial serialize mode. This will eliminate the
> > dependency on timing. The one argument against using this is that it
> > won't be as clear as a separate parameter like
> > 'stream_serialize_threshold' proposed by the patch but OTOH we already
> > have a few parameters that serve a different purpose when used on the
> > subscriber. For example, 'max_replication_slots' is used to define the
> > maximum number of replication slots on the publisher and the maximum
> > number of origins on subscribers. Similarly,
> > wal_retrieve_retry_interval' is used for different purposes on
> > subscriber and standby nodes.
> >
> > [1] - https://www.postgresql.org/docs/devel/runtime-config-developer.html
> >
> > --
> > With Regards,
> > Amit Kapila.
>
> Hi Amit,
>
> On rethinking the complete model, what I feel is that the name
> logical_decoding_mode is not something which defines modes of logical
> decoding. We, I think, picked it based on logical_decoding_work_mem.
> As per current implementation, the parameter 'logical_decoding_mode'
> tells what happens when work-memory used by logical decoding reaches
> its limit. So it is in-fact 'logicalrep_workmem_vacate_mode' or

Minor correction in what I said earlier:
As per current implementation, the parameter 

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-19 Thread shveta malik
On Thu, Jan 19, 2023 at 12:42 PM Takamichi Osumi (Fujitsu)
 wrote:
>
> On Wednesday, January 18, 2023 4:06 PM Peter Smith  
> wrote:
> >  Here are my review comments for the latest patch v16-0001. (excluding the
> > test code)
> Hi, thank you for your review !
>
> > ==
> >
> > General
> >
> > 1.
> >
> > Since the value of min_apply_delay cannot be < 0,  I was thinking probably 
> > it
> > should have been declared everywhere in this patch as a
> > uint64 instead of an int64, right?
> No, we won't be able to adopt this idea.
>
> It seems that we are not able to use uint for catalog type.
> So, can't applying it to the pg_subscription.h definitions
> and then similarly Int64GetDatum to store catalog variables
> and the argument variable of Int64GetDatum.
>
> Plus, there is a possibility that type Interval becomes negative value,
> then we are not able to change the int64 variable to get
> the return value of interval2ms().
>
> > ==
> >
> > Commit message
> >
> > 2.
> >
> > If the subscription sets min_apply_delay parameter, the logical replication
> > worker will delay the transaction commit for min_apply_delay milliseconds.
> >
> > ~
> >
> > IMO there should be another sentence before this just to say that a new
> > parameter is being added:
> >
> > e.g.
> > This patch implements a new subscription parameter called
> > 'min_apply_delay'.
> Added.
>
>
> > ==
> >
> > doc/src/sgml/config.sgml
> >
> > 3.
> >
> > +  
> > +   For time-delayed logical replication, the apply worker sends a 
> > Standby
> > +   Status Update message to the corresponding publisher per the
> > indicated
> > +   time of this parameter. Therefore, if this parameter is longer than
> > +   wal_sender_timeout on the publisher, then the
> > +   walsender doesn't get any update message during the delay and
> > repeatedly
> > +   terminates due to the timeout errors. Hence, make sure this 
> > parameter
> > is
> > +   shorter than the wal_sender_timeout of the
> > publisher.
> > +   If this parameter is set to zero with time-delayed replication, the
> > +   apply worker doesn't send any feedback messages during the
> > +   min_apply_delay.
> > +  
> >
> >
> > This paragraph seemed confusing. I think it needs to be reworded to change 
> > all
> > of the "this parameter" references because there are at least 3 different
> > parameters mentioned in this paragraph. e.g. maybe just change them to
> > explicitly name the parameter you are talking about.
> >
> > I also think it needs to mention the ‘min_apply_delay’ subscription 
> > parameter
> > up-front and then refer to it appropriately.
> >
> > The end result might be something like I wrote below (this is just my guess 
> > ?
> > probably you can word it better).
> >
> > SUGGESTION
> > For time-delayed logical replication (i.e. when the subscription is created 
> > with
> > parameter min_apply_delay > 0), the apply worker sends a Standby Status
> > Update message to the publisher with a period of 
> > wal_receiver_status_interval .
> > Make sure to set wal_receiver_status_interval less than the
> > wal_sender_timeout on the publisher, otherwise, the walsender will 
> > repeatedly
> > terminate due to the timeout errors. If wal_receiver_status_interval is set 
> > to zero,
> > the apply worker doesn't send any feedback messages during the subscriber’s
> > min_apply_delay period.
> Applied. Also, I added one reference for min_apply_delay parameter
> at the end of this description.
>
>
> > ==
> >
> > doc/src/sgml/ref/create_subscription.sgml
> >
> > 4.
> >
> > + 
> > +  By default, the subscriber applies changes as soon as possible. 
> > As
> > +  with the physical replication feature
> > +  (), it can be
> > useful to
> > +  have a time-delayed logical replica. This parameter lets the 
> > user to
> > +  delay the application of changes by a specified amount of
> > time. If this
> > +  value is specified without units, it is taken as milliseconds. 
> > The
> > +  default is zero(no delay).
> > + 
> >
> > 4a.
> > As with the physical replication feature (recovery_min_apply_delay), it can 
> > be
> > useful to have a time-delayed logical replica.
> >
> > IMO not sure that the above sentence is necessary. It seems only to be 
> > saying
> > that this parameter can be useful. Why do we need to say that?
> Removed the sentence.
>
>
> > ~
> >
> > 4b.
> > "This parameter lets the user to delay" -> "This parameter lets the user 
> > delay"
> > OR
> > "This parameter lets the user to delay" -> "This parameter allows the user 
> > to
> > delay"
> Fixed.
>
>
> > ~
> >
> > 4c.
> > "If this value is specified without units" -> "If the value is specified 
> > without
> > units"
> Fixed.
>
> > ~
> >
> > 4d.
> > "zero(no delay)." -> "zero (no delay)."
> Fixed.
>
> > 
> >
> > 5.
> >
> > + 
> > +  The delay occurs only on WAL records for transaction begins and

Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-20 Thread shveta malik
On Fri, Jan 20, 2023 at 1:08 PM Peter Smith  wrote:

> a) the message should say that this is the *remaining* time to left to wait.
>
> b) it might be convenient to know from the log what was the original
> min_apply_delay value in the 1st place.
>
> For example, the logs might look something like this:
>
> DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> 16 ms. Remaining wait time: 159972 ms
> DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> 16 ms. Remaining wait time: 142828 ms
> DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> 16 ms. Remaining wait time: 129994 ms
> DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> 16 ms. Remaining wait time: 110001 ms
> ...
>

+1
This will also help when min_apply_delay is set to a new value in
between the current wait. Lets say, I started with min_apply_delay=5
min, when the worker was half way through this, I changed
min_apply_delay to 3 min or say 10min, I see the impact of that change
i.e. new wait-time is adjusted, but log becomes confusing. So, please
keep this scenario as well in mind while improving logging.

thanks
Shveta




Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-20 Thread shveta malik
On Fri, Jan 20, 2023 at 2:23 PM shveta malik  wrote:
>
> On Fri, Jan 20, 2023 at 1:08 PM Peter Smith  wrote:
>
> > a) the message should say that this is the *remaining* time to left to wait.
> >
> > b) it might be convenient to know from the log what was the original
> > min_apply_delay value in the 1st place.
> >
> > For example, the logs might look something like this:
> >
> > DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> > 16 ms. Remaining wait time: 159972 ms
> > DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> > 16 ms. Remaining wait time: 142828 ms
> > DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> > 16 ms. Remaining wait time: 129994 ms
> > DEBUG: time-delayed replication for txid 1234, min_apply_delay =
> > 16 ms. Remaining wait time: 110001 ms
> > ...
> >
>
> +1
> This will also help when min_apply_delay is set to a new value in
> between the current wait. Lets say, I started with min_apply_delay=5
> min, when the worker was half way through this, I changed
> min_apply_delay to 3 min or say 10min, I see the impact of that change
> i.e. new wait-time is adjusted, but log becomes confusing. So, please
> keep this scenario as well in mind while improving logging.
>


when we send-feedback during apply-delay after every
wal_receiver_status_interval , the log comes as:
023-01-19 17:12:56.000 IST [404795] DEBUG:  sending feedback (force 1)
to recv 0/1570840, write 0/1570840, flush 0/1570840

Shall we have some info here to indicate that it is sent while waiting
for apply_delay to distinguish it from other such send-feedback logs?
It will
make apply_delay flow clear in logs.

thanks
Shveta




Re: Time delayed LR (WAS Re: logical replication restrictions)

2023-01-24 Thread shveta malik
On Tue, Jan 24, 2023 at 5:49 PM Takamichi Osumi (Fujitsu)
 wrote:
>
>
> Attached the patch v20 that has incorporated all comments so far.
> Kindly have a look at the attached patch.
>
>
> Best Regards,
> Takamichi Osumi
>

Thank You for patch. My previous comments are addressed. Tested it and
it looks good. Logging is also fine now.

Just one comment, in summary, we see :
If the subscription sets min_apply_delay parameter, the logical
replication worker will delay the transaction commit for
min_apply_delay milliseconds.

Is it better to write "delay the transaction apply" instead of "delay
the transaction commit" just to be consistent as we do not actually
delay the commit for regular transactions.

thanks
Shveta




Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-01-25 Thread shveta malik
On Mon, Jan 23, 2023 at 6:30 PM Melih Mutlu  wrote:
>
> Hi,
>
> Thanks for your review.
> Attached updated versions of the patches.
>

Hello,
I am still in the process of reviewing the patch, before that I tried
to run below test:

--publisher
create table tab1(id int , name varchar);
create table tab2(id int primary key , name varchar);
create table tab3(id int primary key , name varchar);
Insert into tab1 values(10, 'a');
Insert into tab1 values(20, 'b');
Insert into tab1 values(30, 'c');

Insert into tab2 values(10, 'a');
Insert into tab2 values(20, 'b');
Insert into tab2 values(30, 'c');

Insert into tab3 values(10, 'a');
Insert into tab3 values(20, 'b');
Insert into tab3 values(30, 'c');

create publication mypub for table tab1, tab2, tab3;

--subscriber
create table tab1(id int , name varchar);
create table tab2(id int primary key , name varchar);
create table tab3(id int primary key , name varchar);
create subscription mysub connection 'dbname=postgres host=localhost
user=shveta port=5432' publication mypub;

--I see initial data copied, but new catalog columns srrelslotname
and srreloriginname are not updated:
postgres=# select sublastusedid from pg_subscription;
 sublastusedid
---
 2

postgres=# select * from pg_subscription_rel;
 srsubid | srrelid | srsubstate | srsublsn  | srrelslotname | srreloriginname
-+-++---+---+-
   16409 |   16384 | r  | 0/15219E0 |   |
   16409 |   16389 | r  | 0/15219E0 |   |
   16409 |   16396 | r  | 0/15219E0 |   |

When are these supposed to be updated? I thought the slotname created
will be updated here. Am I missing something here?

Also the v8 patch does not apply on HEAD, giving merge conflicts.

thanks
Shveta




Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-01-27 Thread shveta malik
On Thu, Jan 26, 2023 at 7:53 PM Melih Mutlu  wrote:
>
> If a relation is currently being synced by a tablesync worker and uses a 
> replication slot/origin for that operation, then srrelslotname and 
> srreloriginname fields will have values.
> When a relation is done with its replication slot/origin, their info gets 
> removed from related catalog row, so that slot/origin can be reused for 
> another table or dropped if not needed anymore.
> In your case, all relations are in READY state so it's expected that 
> srrelslotname and srreloriginname are empty. READY relations do not need a 
> replication slot/origin anymore.
>
> Tables are probably synced so quickly that you're missing the moments when a 
> tablesync worker copies a relation and stores its rep. slot/origin in the 
> catalog.
> If initial sync is long enough, then you should be able to see the columns 
> get updated. I follow [1] to make it longer and test if the patch really 
> updates the catalog.
>

Thank You for the details. It is clear now.
>

>
> Rebased and resolved conflicts. Please check the new version
>
Please find my suggestions on v9:

1.
--Can we please add a few more points to the Summary to make it more clear.
a) something telling that reusability of workers is for tables under
one subscription and not across multiple subscriptions.
b) Since we are reusing both workers and slots, can we add:
--when do we actually end up spawning a new worker
--when do we actually end up creating a new slot in a worker rather
than using existing one.
--if we reuse existing slots, what happens to the snapshot?


2.
+   The last used ID for tablesync workers. This ID is used to
+   create replication slots. The last used ID needs to be stored
+   to make logical replication can safely proceed after any interruption.
+   If sublastusedid is 0, then no table has been synced yet.

--typo:
 to make logical replication can safely proceed ==> to make logical
replication safely proceed

--Also, does it sound better:
The last used ID for tablesync workers. It acts as an unique
identifier for replication slots
which are created by table-sync workers. The last used ID needs to be
persisted...


3.
is_first_run;
move_to_next_rel;
--Do you think one variable is enough here as we do not get any extra
info by using 2 variables? Can we have 1 which is more generic like
'ready_to_reuse'. Otherwise, please let me know if we must use 2.


4.
/* missin_ok = true, since the origin might be already dropped. */
typo: missing_ok


5. GetReplicationSlotNamesBySubId:
errmsg("not tuple returned."));

Can we have a better error msg:
ereport(ERROR,
errmsg("could not receive list of slots
associated with subscription %d, error: %s", subid, res->err));

6.
static void
clean_sync_worker(void)

--can we please add introductory comment for this function.

7.
/*
 * Pick the table for the next run if there is not another worker
 * already picked that table.
 */
Pick the table for the next run if it is not already picked up by
another worker.

8.
process_syncing_tables_for_sync()

/* Cleanup before next run or ending the worker. */
--can we please improve this comment:
if there is no more work left for this worker, stop the worker
gracefully, else do clean-up and make it ready for the next
relation/run.

9.
LogicalRepSyncTableStart:
 * Read previous slot name from the catalog, if exists.
 */
prev_slotname = (char *) palloc0(NAMEDATALEN);
Do we need to free this at the end?


10.
if (strlen(prev_slotname) == 0)
{
elog(ERROR, "Replication slot could not be
found for relation %u",
 MyLogicalRepWorker->relid);
}
shall we mention subid also in error msg.

I am reviewing further...
thanks
Shveta




Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-01-31 Thread shveta malik
On Fri, Jan 27, 2023 at 3:41 PM shveta malik  wrote:
>

>
> I am reviewing further...
> thanks
> Shveta

Few more comments:

v4-0001:

1)
REPLICATION_SLOT_SNAPSHOT
--Do we need 'CREATE' prefix with it i.e. CREATE_REPLICATION_SNAPSHOT
(or some other brief one with CREATE?).  'REPLICATION_SLOT_SNAPSHOT'
does not look like a command/action and thus is confusing.

2)
is used in the currenct transaction. This command is currently only supported
for logical replication.
slots.
--typo: currenct-->current
--slots can be moved to previous line

3)
/*
 * Signal that we don't need the timeout mechanism. We're just creating
 * the replication slot and don't yet accept feedback messages or send
 * keepalives. As we possibly need to wait for further WAL the walsender
 * would otherwise possibly be killed too soon.
 */
We're just creating the replication slot --> We're just creating the
replication snapshot


4)
I see XactReadOnly check in CreateReplicationSlot, do we need the same
in ReplicationSlotSnapshot() as well?


===
v9-0002:

5)
  /* We are safe to drop the replication trackin origin after this
--typo: tracking

6)
slot->data.catalog_xmin = xmin_horizon;
slot->effective_xmin = xmin_horizon;
SpinLockRelease(&slot->mutex);
xmin_horizon =
GetOldestSafeDecodingTransactionId(!need_full_snapshot);
ReplicationSlotsComputeRequiredXmin(true);

--do we need to set xmin_horizon in slot after
'GetOldestSafeDecodingTransactionId' call, otherwise it will be set to
InvalidId in slot. Is that intentional? I see that we do set this
correct xmin_horizon in builder->initial_xmin_horizon but the slot is
carrying Invalid one.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2023-11-27 Thread shveta malik
On Mon, Nov 27, 2023 at 2:15 PM Drouvot, Bertrand
 wrote:
>
> Hi,
>
> On 11/27/23 7:02 AM, Zhijie Hou (Fujitsu) wrote:
> > On Monday, November 27, 2023 12:03 PM Zhijie Hou (Fujitsu) 
> >  wrote:
> >>
> >> Attach the V38 patch set which addressed all comments in [1][2] except for 
> >> the
> >> ones that mentioned above.
> >>
> >> [1]
> >> https://www.postgresql.org/message-id/CAHut%2BPv-yu71ogj_hRi6cCtmD5
> >> 5bsyw7XTxj1Nq8yVFKpY3NDQ%40mail.gmail.com
> >> [2]
> >> https://www.postgresql.org/message-id/CAHut%2BPuEGX5kr0xh06yv8ndoA
> >> QvDNedoec1OqOq3GMxDN6p%3D9A%40mail.gmail.com
> >
> > I didn't increment the patch version, sorry for that. Attach the same patch 
> > set
> > but increment the patch version to V39.
>
> Thanks!
>
> It looks like v39 does not contain (some / all?) the changes that have been
> done in v38 [1].
>
> For example, slot_exists_in_list() still exists in v39 while it was renamed to
> validate_sync_slot() in v38.
>

Yes, I noticed that and informed Hou-san about this. New patches will
be posted soon with the correction. Meanwhile, please review v38
instead if you intend to review patch002 right now.  v39 is supposed
to have changes in patch001 alone.

thanks
Shveta


thanks
Shveta




Re: Synchronizing slots from primary to standby

2023-11-27 Thread shveta malik
On Mon, Nov 27, 2023 at 4:08 PM Amit Kapila  wrote:
>
> On Mon, Nov 27, 2023 at 2:27 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > Here is the updated version(v39_2) which include all the changes made in 
> > 0002.
> > Please use for review, and sorry for the confusion.
> >
>
> --- a/src/backend/replication/logical/launcher.c
> +++ b/src/backend/replication/logical/launcher.c
> @@ -8,20 +8,27 @@
>   *   src/backend/replication/logical/launcher.c
>   *
>   * NOTES
> - *   This module contains the logical replication worker launcher which
> - *   uses the background worker infrastructure to start the logical
> - *   replication workers for every enabled subscription.
> + *   This module contains the replication worker launcher which
> + *   uses the background worker infrastructure to:
> + *   a) start the logical replication workers for every enabled subscription
> + *  when not in standby_mode.
> + *   b) start the slot sync worker for logical failover slots synchronization
> + *  from the primary server when in standby_mode.
>
> I was wondering do we really need a launcher on standby to invoke
> sync-slot worker. If so, why? I guess it may be required for previous
> versions where we were managing work for multiple slot-sync workers
> which is also questionable in the sense of whether launcher is the
> right candidate for the same but now with the single slot-sync worker,
> it doesn't seem worth having it. What do you think?
>
> --

Yes, earlier a manager process was needed to manage multiple slot-sync
workers and distribute load among them, but now that does not seem
necessary. I gave it a try (PoC) and it seems to work well.  If  there
are no objections to this approach, I can share the patch soon.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2023-11-28 Thread shveta malik
On Tue, Nov 28, 2023 at 3:10 PM shveta malik  wrote:
>
> On Tue, Nov 28, 2023 at 12:19 PM Drouvot, Bertrand
>  wrote:
> >
> > Hi,
> >
> > On 11/28/23 4:13 AM, shveta malik wrote:
> > > On Mon, Nov 27, 2023 at 4:08 PM Amit Kapila  
> > > wrote:
> > >>
> > >> On Mon, Nov 27, 2023 at 2:27 PM Zhijie Hou (Fujitsu)
> > >>  wrote:
> > >>>
> > >>> Here is the updated version(v39_2) which include all the changes made 
> > >>> in 0002.
> > >>> Please use for review, and sorry for the confusion.
> > >>>
> > >>
> > >> --- a/src/backend/replication/logical/launcher.c
> > >> +++ b/src/backend/replication/logical/launcher.c
> > >> @@ -8,20 +8,27 @@
> > >>*   src/backend/replication/logical/launcher.c
> > >>*
> > >>* NOTES
> > >> - *   This module contains the logical replication worker launcher which
> > >> - *   uses the background worker infrastructure to start the logical
> > >> - *   replication workers for every enabled subscription.
> > >> + *   This module contains the replication worker launcher which
> > >> + *   uses the background worker infrastructure to:
> > >> + *   a) start the logical replication workers for every enabled 
> > >> subscription
> > >> + *  when not in standby_mode.
> > >> + *   b) start the slot sync worker for logical failover slots 
> > >> synchronization
> > >> + *  from the primary server when in standby_mode.
> > >>
> > >> I was wondering do we really need a launcher on standby to invoke
> > >> sync-slot worker. If so, why? I guess it may be required for previous
> > >> versions where we were managing work for multiple slot-sync workers
> > >> which is also questionable in the sense of whether launcher is the
> > >> right candidate for the same but now with the single slot-sync worker,
> > >> it doesn't seem worth having it. What do you think?
> > >>
> > >> --
> > >
> > > Yes, earlier a manager process was needed to manage multiple slot-sync
> > > workers and distribute load among them, but now that does not seem
> > > necessary. I gave it a try (PoC) and it seems to work well.  If  there
> > > are no objections to this approach, I can share the patch soon.
> > >
> >
> > +1 on this new approach, thanks!
>
> PFA v40. This patch has removed Logical Replication Launcher support
> to launch slotsync worker.  The slot-sync worker is now registered as
> bgworker with postmaster, with
> bgw_start_time=BgWorkerStart_ConsistentState and
> bgw_restart_time=60sec.
>
> On removal of launcher, now all the validity checks have been shifted
> to slot-sync worker itself.  This brings us to some point of concerns:
>
> a) We still need to maintain  RecoveryInProgress() check in slotsync
> worker. Since worker has the start time of
> BgWorkerStart_ConsistentState, it will be started on non-standby as
> well. So to ensure that it exists on non-standby, "RecoveryInProgress"
> has been introduced at the beginning of the worker. But once it exits,
> postmaster will not restart it since it will be clean-exist i.e.
> proc_exit(0) (the restart logic of postmaster comes into play only
> when there is an abnormal exit). But to exit for the first time on
> non-standby, we need that Recovery related check in worker.
>
> b) "enable_syncslot" check is moved to slotsync worker now. Since
> enable_syncslot is PGC_SIGHUP, so proc_exit(1) is currently used to
> exit the worker if 'enable_syncslot' is found to be disabled.
> 'proc_exit(1)' has been used in order to ensure that the worker is
> restarted and GUCs are checked again after restart_time. Downside of
> this approach is, if someone has kept "enable_syncslot" as disabled
> permanently even on standby, slotsync worker will keep on restarting
> and exiting.
>
> So to overcome the above pain-points, I think a potential approach
> will be to start slotsync worker only if 'enable_syncslot' is on and
> the system is non-standby. Potential ways (each with some issues) are:
>

Correction here:  start slotsync worker only if 'enable_syncslot' is
on and the system is standby.

> 1) Use the current way i.e. register slot-sync worker as bgworker with
> postmaster, but introduce extra checks in 'maybe_start_bgworkers'. But
> this seems more like a hack. This will need extra changes as currently
> once 'maybe_start_bgw

Re: Synchronizing slots from primary to standby

2023-11-28 Thread shveta malik
On Fri, Nov 17, 2023 at 5:08 PM Amit Kapila  wrote:
>
> On Thu, Nov 16, 2023 at 5:34 PM shveta malik  wrote:
> >
> > PFA v35.
> >
>
> Review v35-0002*
> ==
> 1.
> As quoted in the commit message,
> >
> If a logical slot is invalidated on the primary, slot on the standby is also
> invalidated. If a logical slot on the primary is valid but is invalidated
> on the standby due to conflict (say required rows removed on the primary),
> then that slot is dropped and recreated on the standby in next sync-cycle.
> It is okay to recreate such slots as long as these are not consumable on the
> standby (which is the case currently).
> >
>
> I think this won't happen normally because of the physical slot and
> hot_standby_feedback but probably can occur in cases like if the user
> temporarily switches hot_standby_feedback from on to off. Are there
> any other reasons? I think we can mention the cases along with it as
> well at least for now. Additionally, I think this should be covered in
> code comments as well.
>
> 2.
>  #include "postgres.h"
> -
> +#include "access/genam.h"
>
> Spurious line removal.
>
> 3.
>A password needs to be provided too, if the sender demands password
>authentication.  It can be provided in the
>primary_conninfo string, or in a separate
> -  ~/.pgpass file on the standby server (use
> -  replication as the database name).
> -  Do not specify a database name in the
> -  primary_conninfo string.
> +  ~/.pgpass file on the standby server.
> + 
> + 
> +  Specify dbname in
> +  primary_conninfo string to allow synchronization
> +  of slots from the primary server to the standby server.
> +  This will only be used for slot synchronization. It is ignored
> +  for streaming.
>
> Is there a reason to remove part of the earlier sentence "use
> replication as the database name"?
>
> 4.
> +   enable_syncslot configuration
> parameter
> +  
> +  
> +  
> +   
> +It enables a physical standby to synchronize logical failover slots
> +from the primary server so that logical subscribers are not blocked
> +after failover.
> +   
> +   
> +It is enabled by default. This parameter can only be set in the
> +postgresql.conf file or on the server
> command line.
> +   
>
> I think you forgot to update the documentation for the default value
> of this variable.
>
> 5.
> + *   a) start the logical replication workers for every enabled subscription
> + *  when not in standby_mode
> + *   b) start the slot-sync worker for logical failover slots synchronization
> + *  from the primary server when in standby_mode.
>
> Either use a full stop after both lines or none of these.
>
> 6.
> +static void slotsync_worker_cleanup(SlotSyncWorkerInfo * worker);
>
> There shouldn't be space between * and the worker.
>
> 7.
> + if (!SlotSyncWorker->hdr.in_use)
> + {
> + LWLockRelease(SlotSyncWorkerLock);
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("replication slot-sync worker not initialized, "
> + "cannot attach")));
> + }
> +
> + if (SlotSyncWorker->hdr.proc)
> + {
> + LWLockRelease(SlotSyncWorkerLock);
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("replication slot-sync worker is "
> + "already running, cannot attach")));
> + }
>
> Using slot-sync in the error messages looks a bit odd to me. Can we
> use  "replication slot sync worker ..." in both these and other
> similar messages? I think it would be better if we don't split the
> messages into multiple lines in these cases as messages don't appear
> too long to me.
>
> 8.
> +/*
> + * Detach the worker from DSM and update 'proc' and 'in_use'.
> + * Logical replication launcher will come to know using these
> + * that the worker has shutdown.
> + */
> +void
> +slotsync_worker_detach(int code, Datum arg)
> +{
>
> I think the reference to DSM is leftover from the previous version of
> the patch. Can we change the above comments as per the new code?
>
> 9.
> +static bool
> +slotsync_worker_launch()
> {
> ...
> + /* TODO: do we really need 'generation', analyse more here */
> + worker->hdr.generation++;
>
> We should do something about this TODO. As per my understanding, we
> don't need a generation number for the slot sync

Re: Synchronizing slots from primary to standby

2023-11-28 Thread shveta malik
On Tue, Nov 28, 2023 at 9:28 PM Drouvot, Bertrand
 wrote:
>
> Hi,
>
> On 11/28/23 10:40 AM, shveta malik wrote:
> > On Tue, Nov 28, 2023 at 12:19 PM Drouvot, Bertrand
> >  wrote:
> >>
> >> Hi,
> >>
> >> On 11/28/23 4:13 AM, shveta malik wrote:
> >>> On Mon, Nov 27, 2023 at 4:08 PM Amit Kapila  
> >>> wrote:
> >>>>
> >>>> On Mon, Nov 27, 2023 at 2:27 PM Zhijie Hou (Fujitsu)
> >>>>  wrote:
> >>>>>
> >>>>> Here is the updated version(v39_2) which include all the changes made 
> >>>>> in 0002.
> >>>>> Please use for review, and sorry for the confusion.
> >>>>>
> >>>>
> >>>> --- a/src/backend/replication/logical/launcher.c
> >>>> +++ b/src/backend/replication/logical/launcher.c
> >>>> @@ -8,20 +8,27 @@
> >>>> *   src/backend/replication/logical/launcher.c
> >>>> *
> >>>> * NOTES
> >>>> - *   This module contains the logical replication worker launcher which
> >>>> - *   uses the background worker infrastructure to start the logical
> >>>> - *   replication workers for every enabled subscription.
> >>>> + *   This module contains the replication worker launcher which
> >>>> + *   uses the background worker infrastructure to:
> >>>> + *   a) start the logical replication workers for every enabled 
> >>>> subscription
> >>>> + *  when not in standby_mode.
> >>>> + *   b) start the slot sync worker for logical failover slots 
> >>>> synchronization
> >>>> + *  from the primary server when in standby_mode.
> >>>>
> >>>> I was wondering do we really need a launcher on standby to invoke
> >>>> sync-slot worker. If so, why? I guess it may be required for previous
> >>>> versions where we were managing work for multiple slot-sync workers
> >>>> which is also questionable in the sense of whether launcher is the
> >>>> right candidate for the same but now with the single slot-sync worker,
> >>>> it doesn't seem worth having it. What do you think?
> >>>>
> >>>> --
> >>>
> >>> Yes, earlier a manager process was needed to manage multiple slot-sync
> >>> workers and distribute load among them, but now that does not seem
> >>> necessary. I gave it a try (PoC) and it seems to work well.  If  there
> >>> are no objections to this approach, I can share the patch soon.
> >>>
> >>
> >> +1 on this new approach, thanks!
> >
> > PFA v40. This patch has removed Logical Replication Launcher support
> > to launch slotsync worker.
>
> Thanks!
>
> >  The slot-sync worker is now registered as
> > bgworker with postmaster, with
> > bgw_start_time=BgWorkerStart_ConsistentState and
> > bgw_restart_time=60sec.
> >
> > On removal of launcher, now all the validity checks have been shifted
> > to slot-sync worker itself.  This brings us to some point of concerns:
> >
> > a) We still need to maintain  RecoveryInProgress() check in slotsync
> > worker. Since worker has the start time of
> > BgWorkerStart_ConsistentState, it will be started on non-standby as
> > well. So to ensure that it exists on non-standby, "RecoveryInProgress"
> > has been introduced at the beginning of the worker. But once it exits,
> > postmaster will not restart it since it will be clean-exist i.e.
> > proc_exit(0) (the restart logic of postmaster comes into play only
> > when there is an abnormal exit). But to exit for the first time on
> > non-standby, we need that Recovery related check in worker.
> >
> > b) "enable_syncslot" check is moved to slotsync worker now. Since
> > enable_syncslot is PGC_SIGHUP, so proc_exit(1) is currently used to
> > exit the worker if 'enable_syncslot' is found to be disabled.
> > 'proc_exit(1)' has been used in order to ensure that the worker is
> > restarted and GUCs are checked again after restart_time. Downside of
> > this approach is, if someone has kept "enable_syncslot" as disabled
> > permanently even on standby, slotsync worker will keep on restarting
> > and exiting.
> >
> > So to overcome the above pain-points, I think a potential approach
> > will be to start slotsync worker only if 'enable_syncslot' is on and
> > the system is non-standby.
&g

Re: Synchronizing slots from primary to standby

2023-11-30 Thread shveta malik
On Thu, Nov 30, 2023 at 5:37 PM Ajin Cherian  wrote:
>
> On Wed, Nov 29, 2023 at 8:17 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > This has been fixed.
> >
> > Best Regards,
> > Hou zj
>
> Thanks for addressing my comments. Some comments from my testing of patch v41
>
> 1. In my opinion, the second message "aborting the wait...moving to
> the next slot" does not hold much value. There might not even be a
> "next slot", there might be just one slot. I think the first LOG is
> enough to indicate that the sync-slot is waiting as it repeats this
> log till the slot catches up. I know these messages hold great value
> for debugging but in production,  "waiting..", "aborting the wait.."
> might not be as helpful, maybe change it to debug?
>
> 2023-11-30 05:13:49.811 EST [6115] LOG:  waiting for remote slot
> "sub1" LSN (0/3047A90) and catalog xmin (745) to pass local slot LSN
> (0/3047AC8) and catalog xmin (745)
> 2023-11-30 05:13:57.909 EST [6115] LOG:  aborting the wait for remote
> slot "sub1" and moving to the next slot, will attempt creating it
> again
> 2023-11-30 05:14:07.921 EST [6115] LOG:  waiting for remote slot
> "sub1" LSN (0/3047A90) and catalog xmin (745) to pass local slot LSN
> (0/3047AC8) and catalog xmin (745)
>

Sure, the message can be trimmed down. But I am not very sure if we
should convert it to DEBUG. It might be useful to know what exactly is
happening with this slot through the log file.Curious to know what
others think here?

>
> 2. If a slot on the standby is in the "i" state as it hasn't been
> synced and it was invalidated on the primary, should you continuously
> retry creating this invalidated slot on the standby?
>
> 2023-11-30 06:21:41.844 EST [10563] LOG:  waiting for remote slot
> "sub1" LSN (0/0) and catalog xmin (785) to pass local slot LSN
> (0/EED9330) and catalog xmin (785)
> 2023-11-30 06:21:41.845 EST [10563] WARNING:  slot "sub1" invalidated
> on the primary server, slot creation aborted
> 2023-11-30 06:21:51.892 EST [10563] LOG:  waiting for remote slot
> "sub1" LSN (0/0) and catalog xmin (785) to pass local slot LSN
> (0/EED9330) and catalog xmin (785)
> 2023-11-30 06:21:51.893 EST [10563] WARNING:  slot "sub1" invalidated
> on the primary server, slot creation aborted
>

No, it should not be synced after that. It should be marked as
invalidated and skipped. And perhaps the state should also be moved to
'r' as we are done with it; but since it is invalidated, it will not
be used further even if 'r'.

> 3. If creation of a slot on the standby fails for one slot because a
> slot of the same name exists, then thereafter no new sync slots are
> created on standby. Is this expected? I do see that previously created
> slots are kept up to date, just that no new slots are created after
> that.
>

yes, it is done so as per the suggestion/discussion in [1]. It is done
so that users can catch this issue at the earliest.

[1]: 
https://www.postgresql.org/message-id/CAA4eK1J5D-Z7dFa89acf7O%2BCa6Y9bygTpi52KAKVCg%2BPE%2BZfog%40mail.gmail.com

thanks
Shveta




Re: Synchronizing slots from primary to standby

2023-11-30 Thread shveta malik
On Fri, Dec 1, 2023 at 9:40 AM Zhijie Hou (Fujitsu)
 wrote:
>
> On Wednesday, November 29, 2023 5:12 PM Zhijie Hou (Fujitsu) 
>  wrote:
>
> I was reviewing slotsync worker design and here
> are few comments on 0002 patch:

Thanks for reviewing the patch.

>
> 1.
>
> +   if (!WalRcv ||
> +   (WalRcv->slotname[0] == '\0') ||
> +   XLogRecPtrIsInvalid(WalRcv->latestWalEnd))
>
> I think we'd better take spinlock when accessing these shared memory fields.
>
> 2.
>
> /*
>  * The slot sync feature itself is disabled, exit.
>  */
> if (!enable_syncslot)
> {
> ereport(LOG,
> errmsg("exiting slot sync worker as 
> enable_syncslot is disabled."));
>
> Can we check the GUC when registering the worker(SlotSyncWorkerRegister),
> so that the worker won't be started if enable_syncslot is false.
>
> 3. In synchronize_one_slot, do we need to skip the slot sync and drop if the
> local slot is a physical one ?
>

IMO, if a local slot exists which is a physical one, it will be a user
created slot and in that case worker will error out on finding
existing slot with same name. And the case where local slot is
physical one but not user-created is not possible on standby (assuming
we have correct check on primary disallowing setting 'failover'
property for physical slot). Do you have some other scenario in mind,
which I am missing here?

> 4.
>
> *locally_invalidated =
> (remote_slot->invalidated == RS_INVAL_NONE) &&
> (local_slot->data.invalidated != 
> RS_INVAL_NONE);
>
> When reading the invalidated flag of local slot, I think we'd better take
> spinlock.
>
> Best Regards,
> Hou zj




Re: Synchronizing slots from primary to standby

2023-11-30 Thread shveta malik
On Fri, Dec 1, 2023 at 11:17 AM Zhijie Hou (Fujitsu)
 wrote:
>
> On Friday, December 1, 2023 12:51 PM shveta malik  
> wrote:
>
> Hi,
>
> >
> > On Fri, Dec 1, 2023 at 9:40 AM Zhijie Hou (Fujitsu)
> >  wrote:
> > >
> > > On Wednesday, November 29, 2023 5:12 PM Zhijie Hou (Fujitsu)
> >  wrote:
> > >
> > > I was reviewing slotsync worker design and here
> > > are few comments on 0002 patch:
> >
> > Thanks for reviewing the patch.
> >
> > >
> > >
> > > 3. In synchronize_one_slot, do we need to skip the slot sync and drop if 
> > > the
> > > local slot is a physical one ?
> > >
> >
> > IMO, if a local slot exists which is a physical one, it will be a user
> > created slot and in that case worker will error out on finding
> > existing slot with same name. And the case where local slot is
> > physical one but not user-created is not possible on standby (assuming
> > we have correct check on primary disallowing setting 'failover'
> > property for physical slot). Do you have some other scenario in mind,
> > which I am missing here?
>
> I was thinking about the race condition when it has confirmed that the slot is
> not a user created one and enter "sync_state == SYNCSLOT_STATE_READY" branch,
> but at this moment, if someone uses "DROP_REPLICATION_SLOT" to drop this slot 
> and
> recreate another one(e.g. a physical one), then the slotsync worker will
> overwrite the fields of this physical slot. Although this affects user created
> logical slots in similar cases as well.
>

User can not  drop the synced slots on standby. It should result in
ERROR. Currently we emit this error in pg_drop_replication_slot(),
same is needed in  "DROP_REPLICATION_SLOT" replication cmd. I will
change it. Thanks for raising this point.  I think, after this ERROR,
there is no need to worry about physical slots handling in
synchronize_one_slot().

> And the same is true for slotsync_drop_initiated_slots() and
> drop_obsolete_slots(), as we don't lock the slots in the list, if user tri to
> drop and re-create old slot concurrently, then we could drop user created slot
> here.
>




Re: Synchronizing slots from primary to standby

2023-12-01 Thread shveta malik
On Fri, Dec 1, 2023 at 3:43 PM Drouvot, Bertrand
 wrote:
>
> Hi,
>
> On 11/30/23 1:06 PM, Ajin Cherian wrote:
> > On Wed, Nov 29, 2023 at 8:17 PM Zhijie Hou (Fujitsu)
> >
> > 3. If creation of a slot on the standby fails for one slot because a
> > slot of the same name exists, then thereafter no new sync slots are
> > created on standby. Is this expected? I do see that previously created
> > slots are kept up to date, just that no new slots are created after
> > that.
>
> Yes this is the expected behavior as per discussion in [1].
> Does this behavior make sense to you?
>
Not completely. The chances of slots getting synced in this case seems
order based. Every time a worker restarts after the error (considering
the user has not taken corrective action yet), it will successfully
sync the slots prior to the problematic one, while leaving the ones
after that un-synced.

I need a little more clarity on what is the way for the user to know
that the slot-sync worker (or any background worker for say) has
error'ed out?  Is it only from a log file or are there other
mechanisms used for this? I mean, do ERRORs have better chances to
catch user's attention than WARNING in the context of background
worker?  I feel we can give a second thought on this and see if it is
more appropriate to keep on syncing the rest of the slots and skip the
duplicate-name one?

thanks
Shveta




Re: Synchronizing slots from primary to standby

2023-12-03 Thread shveta malik
On Fri, Dec 1, 2023 at 5:40 PM Nisha Moond  wrote:
>
> Review for v41 patch.

Thanks for the feedback.

>
> 1.
> ==
> src/backend/utils/misc/postgresql.conf.sample
>
> +#enable_syncslot = on # enables slot synchronization on the physical
> standby from the primary
>
> enable_syncslot is disabled by default, so, it should be 'off' here.
>

Sure, I will change it.

> ~~~
> 2.
> IIUC, the slotsyncworker's connection to the primary is to execute a
> query. Its aim is not walsender type connection, but at primary when
> queried, the 'backend_type' is set to 'walsender'.
> Snippet from primary db-
>
> datname  |   usename   | application_name | wait_event_type | backend_type
> -+-+--+-+--
> postgres | replication | slotsyncworker   | Client  | walsender
>
> Is it okay?
>

Slot sync worker uses 'libpqrcv_connect' for connection which sends
'replication'-'database' key-value pair as one of the connection
options. And on the primary side, 'ProcessStartupPacket' on the basis
of this key-value pair sets the process as walsender one (am_walsender
= true).
And thus this reflects as backend_type='walsender' in
pg_stat_activity. I do not see any harm in this backend_type for
slot-sync worker currently. This is on a similar line of connections
used for logical-replications. And since a slot-sync worker also deals
with wals-positions (lsns), it is okay to maintain backend_type as
walsender unless you (or others) see any potential issue in doing
that. So let me know.

> ~~~
> 3.
> As per current logic, If there are slots on primary with disabled
> subscriptions, then, when standby is created it replicates these slots
> but can't make them sync-ready until any activity happens on the
> slots.
> So, such slots stay in 'i' sync-state and get dropped when failover
> happens. Now, if the subscriber tries to enable their existing
> subscription after failover, it gives an error that the slot does not
> exist.
>

yes, this is expected as Amit explained in [1]. But let me review if
we need to document this case for disabled subscriptions. i.e.
disabled subscription if enabled after promotion might not work.

> ~~~
> 4. primary_slot_name GUC value test:
>
> When standby is started with a non-existing primary_slot_name, the
> wal-receiver gives an error but the slot-sync worker does not raise
> any error/warning. It is no-op though as it has a check 'if
> (XLogRecPtrIsInvalid(WalRcv->latestWalEnd)) do nothing'.   Is this
> okay or shall the slot-sync worker too raise an error and exit?
>
> In another case, when standby is started with valid primary_slot_name,
> but it is changed to some invalid value in runtime, then walreceiver
> starts giving error but the slot-sync worker keeps on running. In this
> case, unlike the previous case, it even did not go to no-op mode (as
> it sees valid WalRcv->latestWalEnd from the earlier run) and keep
> pinging primary repeatedly for slots.  Shall here it should error out
> or at least be no-op until we give a valid primary_slot_name?
>

 I reviewed it. There is no way to test the existence/validity of
'primary_slot_name' on standby without making a connection to primary.
If primary_slot_name is invalid from the start, slot-sync worker will
be no-op (as you tested) as WalRecv->latestWalENd will be invalid, and
if 'primary_slot_name' is changed to invalid on runtime, slot-sync
worker will still keep on pinging primary. But that should be okay (in
fact needed) as it needs to sync at-least the previous slot's
positions (in case it is delayed in doing so for some reason earlier).
And once the slots are up-to-date on standby, even if worker pings
primary, it will not see any change in slots lsns and thus go for
longer nap. I think, it is not worth the effort to introduce the
complexity of checking validity of 'primary_slot_name' on primary from
standby for this rare scenario.

It will be good to know thoughts of others on above 3 points.

thanks
Shveta




Re: Synchronizing slots from primary to standby

2023-12-03 Thread shveta malik
On Mon, Dec 4, 2023 at 10:40 AM shveta malik  wrote:
>
> On Fri, Dec 1, 2023 at 5:40 PM Nisha Moond  wrote:
> >
> > Review for v41 patch.
>
> Thanks for the feedback.
>
> >
> > 1.
> > ==
> > src/backend/utils/misc/postgresql.conf.sample
> >
> > +#enable_syncslot = on # enables slot synchronization on the physical
> > standby from the primary
> >
> > enable_syncslot is disabled by default, so, it should be 'off' here.
> >
>
> Sure, I will change it.
>
> > ~~~
> > 2.
> > IIUC, the slotsyncworker's connection to the primary is to execute a
> > query. Its aim is not walsender type connection, but at primary when
> > queried, the 'backend_type' is set to 'walsender'.
> > Snippet from primary db-
> >
> > datname  |   usename   | application_name | wait_event_type | backend_type
> > -+-+--+-+--
> > postgres | replication | slotsyncworker   | Client  | walsender
> >
> > Is it okay?
> >
>
> Slot sync worker uses 'libpqrcv_connect' for connection which sends
> 'replication'-'database' key-value pair as one of the connection
> options. And on the primary side, 'ProcessStartupPacket' on the basis
> of this key-value pair sets the process as walsender one (am_walsender
> = true).
> And thus this reflects as backend_type='walsender' in
> pg_stat_activity. I do not see any harm in this backend_type for
> slot-sync worker currently. This is on a similar line of connections
> used for logical-replications. And since a slot-sync worker also deals
> with wals-positions (lsns), it is okay to maintain backend_type as
> walsender unless you (or others) see any potential issue in doing
> that. So let me know.
>
> > ~~~
> > 3.
> > As per current logic, If there are slots on primary with disabled
> > subscriptions, then, when standby is created it replicates these slots
> > but can't make them sync-ready until any activity happens on the
> > slots.
> > So, such slots stay in 'i' sync-state and get dropped when failover
> > happens. Now, if the subscriber tries to enable their existing
> > subscription after failover, it gives an error that the slot does not
> > exist.
> >
>
> yes, this is expected as Amit explained in [1]. But let me review if
> we need to document this case for disabled subscriptions. i.e.
> disabled subscription if enabled after promotion might not work.

Sorry, missed to mention the link earlier:
[1]: 
https://www.postgresql.org/message-id/CAA4eK1J5Hxp%2BzhvptyyjqQ4JSQzwnkFRXtQn8v9opxtZmmY_Ug%40mail.gmail.com




  1   2   3   4   >