Re: Function to get invalidation cause of a replication slot.
On Thu, Dec 21, 2023 at 2:59 PM Amit Kapila wrote: > > On Thu, Dec 21, 2023 at 12:07 PM Michael Paquier wrote: > > > > On Thu, Dec 21, 2023 at 11:53:04AM +0530, Amit Kapila wrote: > > > On Thu, Dec 21, 2023 at 11:18 AM Michael Paquier > > > wrote: > > > Yeah, if one uses them independently then there is no such guarantee. > > > > This could be possible in the same query as well, still less likely, > > as the contents are volatile. > > > > True, this is quite obvious but that was not a recommended way to use > the function. Anyway, now that we agree to expose it via an existing > function, there is no point in further argument on this. > > > >> A lot could happen between both function calls while the > > >> repslot LWLock is not hold. > > >> > > >> Yeah, you could keep the reason text as NULL when there is no > > >> conflict, replacing the boolean by the text in the function, and keep > > >> the view definition compatible with v16 while adding an extra column. > > > > > > But as mentioned we also want the enum value to be exposed in some way > > > so that it can be used by the sync slot feature [1] as well, > > > otherwise, we may need some mappings to convert the text back to an > > > enum. I guess if we want to expose via view, then we can return an > > > enum value by pg_get_replication_slots() and the view can replace it > > > with text based on the value. > > > > Sure. Something like is OK by me as long as the data is retrieved > > from a single scan of the slot data while holding the slot data's > > LWLock. > > > > Okay, so let's go this way unless someone feels otherwise. Please track the progress in another thread [1] where the patch is posted today. [1]: https://www.postgresql.org/message-id/ZYOE8IguqTbp-seF%40paquier.xyz thanks Shveta
Re: Function to get invalidation cause of a replication slot.
On Thu, Dec 21, 2023 at 12:07 PM Michael Paquier wrote: > > On Thu, Dec 21, 2023 at 11:53:04AM +0530, Amit Kapila wrote: > > On Thu, Dec 21, 2023 at 11:18 AM Michael Paquier > > wrote: > > Yeah, if one uses them independently then there is no such guarantee. > > This could be possible in the same query as well, still less likely, > as the contents are volatile. > True, this is quite obvious but that was not a recommended way to use the function. Anyway, now that we agree to expose it via an existing function, there is no point in further argument on this. > >> A lot could happen between both function calls while the > >> repslot LWLock is not hold. > >> > >> Yeah, you could keep the reason text as NULL when there is no > >> conflict, replacing the boolean by the text in the function, and keep > >> the view definition compatible with v16 while adding an extra column. > > > > But as mentioned we also want the enum value to be exposed in some way > > so that it can be used by the sync slot feature [1] as well, > > otherwise, we may need some mappings to convert the text back to an > > enum. I guess if we want to expose via view, then we can return an > > enum value by pg_get_replication_slots() and the view can replace it > > with text based on the value. > > Sure. Something like is OK by me as long as the data is retrieved > from a single scan of the slot data while holding the slot data's > LWLock. > Okay, so let's go this way unless someone feels otherwise. -- With Regards, Amit Kapila.
Re: Function to get invalidation cause of a replication slot.
On Thu, Dec 21, 2023 at 11:53:04AM +0530, Amit Kapila wrote: > On Thu, Dec 21, 2023 at 11:18 AM Michael Paquier wrote: > Yeah, if one uses them independently then there is no such guarantee. This could be possible in the same query as well, still less likely, as the contents are volatile. >> A lot could happen between both function calls while the >> repslot LWLock is not hold. >> >> Yeah, you could keep the reason text as NULL when there is no >> conflict, replacing the boolean by the text in the function, and keep >> the view definition compatible with v16 while adding an extra column. > > But as mentioned we also want the enum value to be exposed in some way > so that it can be used by the sync slot feature [1] as well, > otherwise, we may need some mappings to convert the text back to an > enum. I guess if we want to expose via view, then we can return an > enum value by pg_get_replication_slots() and the view can replace it > with text based on the value. Sure. Something like is OK by me as long as the data is retrieved from a single scan of the slot data while holding the slot data's LWLock. -- Michael signature.asc Description: PGP signature
Re: Function to get invalidation cause of a replication slot.
On Thu, Dec 21, 2023 at 11:18 AM Michael Paquier wrote: > > On Thu, Dec 21, 2023 at 09:37:39AM +0530, Amit Kapila wrote: > > It depends on how one uses the function. For example, if one finds > > there is a conflict via pg_get_replication_slots() and wants to check > > the reason for the same via this new function then it would give the > > correct answer. > > My point is that we may not get the "correct" answer at all :/ > > What guarantee do you have that between the scan of > pg_get_replication_slots() to get the value of "conflicting" and the > call of pg_get_slot_invalidation_cause() the slot state will still be > the same? > Yeah, if one uses them independently then there is no such guarantee. > A lot could happen between both function calls while the > repslot LWLock is not hold. > > > Now, if we think it is okay to have two columns > > 'conflicting' and 'conflict_reason/conflict_cause' to be returned by > > pg_get_replication_slots() then that should serve the purpose as well > > but one can argue that 'conflicting' is deducible from > > 'conflict_reason'. > > Yeah, you could keep the reason text as NULL when there is no > conflict, replacing the boolean by the text in the function, and keep > the view definition compatible with v16 while adding an extra column. > But as mentioned we also want the enum value to be exposed in some way so that it can be used by the sync slot feature [1] as well, otherwise, we may need some mappings to convert the text back to an enum. I guess if we want to expose via view, then we can return an enum value by pg_get_replication_slots() and the view can replace it with text based on the value. [1] - https://www.postgresql.org/message-id/OS0PR01MB5716DAF72265388A2AD424119495A%40OS0PR01MB5716.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
Re: Function to get invalidation cause of a replication slot.
On Thu, Dec 21, 2023 at 09:37:39AM +0530, Amit Kapila wrote: > It depends on how one uses the function. For example, if one finds > there is a conflict via pg_get_replication_slots() and wants to check > the reason for the same via this new function then it would give the > correct answer. My point is that we may not get the "correct" answer at all :/ What guarantee do you have that between the scan of pg_get_replication_slots() to get the value of "conflicting" and the call of pg_get_slot_invalidation_cause() the slot state will still be the same? A lot could happen between both function calls while the repslot LWLock is not hold. > Now, if we think it is okay to have two columns > 'conflicting' and 'conflict_reason/conflict_cause' to be returned by > pg_get_replication_slots() then that should serve the purpose as well > but one can argue that 'conflicting' is deducible from > 'conflict_reason'. Yeah, you could keep the reason text as NULL when there is no conflict, replacing the boolean by the text in the function, and keep the view definition compatible with v16 while adding an extra column. -- Michael signature.asc Description: PGP signature
Re: Function to get invalidation cause of a replication slot.
On Thu, Dec 21, 2023 at 8:47 AM Michael Paquier wrote: > > On Wed, Dec 20, 2023 at 12:43:45PM +0100, Drouvot, Bertrand wrote: > > + 3 = wal_level insufficient on the primary > > server > > > > "." is missing at the end (to be consistent with 1 and 2). Same > > in the commit message. > > > > + * Returns ReplicationSlotInvalidationCause enum value for valid slot_name; > > > > Not sure the sentence should finish with ";". > > > > Another Nit is to add a comment in ReplicationSlotInvalidationCause > > definition (slot.h) > > that any new enum values (if any) should be added after the ones that are > > already defined (to > > provide some consistency across changes in this area if any). > > > > Except the above Nit(s) the patch LGTM. > > This patch has a problem: this design can easily cause the report of > data inconsistent with pg_replication_slots because the data returned > by pg_get_replication_slots() and pg_get_slot_invalidation_cause() > would happen across *two* function call contexts, no? So, it seems to > me that this had better be integrated as an extra column of > pg_get_replication_slots() to ensure the consistency of the data > reported. And it's critical to get consistent data for monitoring. > It depends on how one uses the function. For example, if one finds there is a conflict via pg_get_replication_slots() and wants to check the reason for the same via this new function then it would give the correct answer. Now, if we think it is okay to have two columns 'conflicting' and 'conflict_reason/conflict_cause' to be returned by pg_get_replication_slots() then that should serve the purpose as well but one can argue that 'conflicting' is deducible from 'conflict_reason'. The other thing we want to decide is how useful will it be to display this information via pg_replication_slots view considering we already have a boolean for it. Shall we add yet another column in the view and if so, it would probably be a string rather than an enum? Now, if we do so, we still want function pg_get_replication_slots() or pg_get_slot_invalidation_cause() to return an enum value as we want that to be synced/copied to standby. -- With Regards, Amit Kapila.
Re: Function to get invalidation cause of a replication slot.
On Wed, Dec 20, 2023 at 12:43:45PM +0100, Drouvot, Bertrand wrote: > + 3 = wal_level insufficient on the primary > server > > "." is missing at the end (to be consistent with 1 and 2). Same > in the commit message. > > + * Returns ReplicationSlotInvalidationCause enum value for valid slot_name; > > Not sure the sentence should finish with ";". > > Another Nit is to add a comment in ReplicationSlotInvalidationCause > definition (slot.h) > that any new enum values (if any) should be added after the ones that are > already defined (to > provide some consistency across changes in this area if any). > > Except the above Nit(s) the patch LGTM. This patch has a problem: this design can easily cause the report of data inconsistent with pg_replication_slots because the data returned by pg_get_replication_slots() and pg_get_slot_invalidation_cause() would happen across *two* function call contexts, no? So, it seems to me that this had better be integrated as an extra column of pg_get_replication_slots() to ensure the consistency of the data reported. And it's critical to get consistent data for monitoring. Not to mention that the lookup had better happen also while holding ReplicationSlotControlLock as well, which is also something InvalidatePossiblyObsoleteSlot() relies on. v2 ignores that. -- Michael signature.asc Description: PGP signature
Re: Function to get invalidation cause of a replication slot.
Hi, On 12/20/23 10:55 AM, shveta malik wrote: On Wed, Dec 20, 2023 at 2:20 PM Amit Kapila wrote: PFA v2 patch. Addressed below comments: 1) Added test in 019_replslot_limit.pl 2) 'pg_get_slot_invalidation_cause' now returns error if the given slot does not exist 3) Corrected doc and commit msg. Thanks! + 3 = wal_level insufficient on the primary server "." is missing at the end (to be consistent with 1 and 2). Same in the commit message. + * Returns ReplicationSlotInvalidationCause enum value for valid slot_name; Not sure the sentence should finish with ";". Another Nit is to add a comment in ReplicationSlotInvalidationCause definition (slot.h) that any new enum values (if any) should be added after the ones that are already defined (to provide some consistency across changes in this area if any). Except the above Nit(s) the patch LGTM. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Function to get invalidation cause of a replication slot.
On Wed, Dec 20, 2023 at 4:10 PM Drouvot, Bertrand wrote: > > On 12/20/23 9:50 AM, Amit Kapila wrote: > > On Wed, Dec 20, 2023 at 12:46 PM Drouvot, Bertrand > > wrote: > >> > > > >> 4 === > >> > >> + * Returns ReplicationSlotInvalidationCause enum value for valid > >> slot_name; > >> > >> I think it would be more user friendly to return a description instead of > >> an enum value. > >> > > But it would be tricky to use at a place where we want to know the > > enum value as in the case of the sync slots patch where we want to > > copy the cause. > > Yeah right, what about displaying both then? (one field for the enum and one > for > the description). I feel it's not friendly to ask users to refer to the > documentation > (or the commit message) to get the meaning of the output. > > Another option could be to create a new view, say > pg_slot_invalidation_causes, that would list > them all (cause_id, cause_description) and then keep > pg_get_slot_invalidation_cause() returning > the cause_id only. > I am not sure if it is really valuable enough to have a separate view but if others also feel that would be a good idea then we can do it. I feel for the current purpose having a function as proposed in the patch is good enough, we can always extend it or add a new view dependening on if some users really care about it. -- With Regards, Amit Kapila.
Re: Function to get invalidation cause of a replication slot.
Hi, On 12/20/23 10:57 AM, shveta malik wrote: On Wed, Dec 20, 2023 at 12:46 PM Drouvot, Bertrand Worth to add some coverage in 019_replslot_limit.pl and 035_standby_logical_decoding.pl? I have recently added a test in 019_replslot_limit.pl in v2. Thanks! Do you suggest adding in 035_standby_logical_decoding as well? Will it have additional benefits? That would cover the remaining invalidation causes but I'm not sure that's worth it as this new function is simple enough after all. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Function to get invalidation cause of a replication slot.
Hi, On 12/20/23 9:50 AM, Amit Kapila wrote: On Wed, Dec 20, 2023 at 12:46 PM Drouvot, Bertrand wrote: On 12/20/23 7:01 AM, shveta malik wrote: Hello hackers, Attached is a patch which attempts to implement a new system function pg_get_slot_invalidation_cause('slot_name') to get invalidation cause of a replication slot. Thanks! +1 for the idea to display this information through an SQL API. I wonder if we could add a new field to pg_replication_slots instead of creating a new function. Yeah, that is another option. We already have a boolean column 'conflicting' in pg_replication_slots, so are you suggesting adding a new column like 'conflict_cause'? Yes. I feel it is okay to have an SQL function for this as it may be primarily used for internal purposes or for some specific users who want to dig deeper for the invalidation cause. 4 === + * Returns ReplicationSlotInvalidationCause enum value for valid slot_name; I think it would be more user friendly to return a description instead of an enum value. But it would be tricky to use at a place where we want to know the enum value as in the case of the sync slots patch where we want to copy the cause. Yeah right, what about displaying both then? (one field for the enum and one for the description). I feel it's not friendly to ask users to refer to the documentation (or the commit message) to get the meaning of the output. Another option could be to create a new view, say pg_slot_invalidation_causes, that would list them all (cause_id, cause_description) and then keep pg_get_slot_invalidation_cause() returning the cause_id only. Also I think we should add a comment in ReplicationSlotInvalidationCause definition (slot.h) that any new enum values (if any) should be added after the ones that are already defined (to provide some consistency across changes in this area if any). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Function to get invalidation cause of a replication slot.
On Wed, Dec 20, 2023 at 12:46 PM Drouvot, Bertrand wrote: > > Hi, > > On 12/20/23 7:01 AM, shveta malik wrote: > > Hello hackers, > > > > Attached is a patch which attempts to implement a new system function > > pg_get_slot_invalidation_cause('slot_name') to get invalidation cause > > of a replication slot. > > Thanks! +1 for the idea to display this information through an SQL API. > > I wonder if we could add a new field to pg_replication_slots instead > of creating a new function. > > > One of the use case scenarios for this function is another ongoing > > thread "Synchronizing slots from primary to standby" at [1]. This > > function is needed there to replicate invalidation-cause of a logical > > replication slot from the primary server to the hot standby. But this > > is an independent requirement in itself and thus makes sense to have > > it implemented separately. > > Agree. > > My thoughts about it: > > 1 === > > "Currently, users do not have a way to know the invalidation cause" > > I'm not sure about it, I think one could see the reasons in the log file. > > 2 === > > "This function returns NULL if the replication slot is not found" > > Why not returning an error instead? (like pg_drop_replication_slot() does for > example) > > FWIW, we'd not need to cover this case if the description would be added to > pg_replication_slots. > > 3 === > > + > + 3 = wal_level insufficient for the slot > + > > wal_level insufficient on the primary maybe? > > 4 === > > + * Returns ReplicationSlotInvalidationCause enum value for valid slot_name; > > I think it would be more user friendly to return a description instead of an > enum value. > > 5 === > > doc/src/sgml/func.sgml | 33 + > src/backend/replication/slotfuncs.c | 27 +++ > src/include/catalog/pg_proc.dat | 4 > > Worth to add some coverage in 019_replslot_limit.pl and > 035_standby_logical_decoding.pl? I have recently added a test in 019_replslot_limit.pl in v2. Do you suggest adding in 035_standby_logical_decoding as well? Will it have additional benefits? thanks Shveta
Re: Function to get invalidation cause of a replication slot.
On Wed, Dec 20, 2023 at 2:20 PM Amit Kapila wrote: > > On Wed, Dec 20, 2023 at 12:46 PM Drouvot, Bertrand > wrote: > > > > On 12/20/23 7:01 AM, shveta malik wrote: > > > Hello hackers, > > > > > > Attached is a patch which attempts to implement a new system function > > > pg_get_slot_invalidation_cause('slot_name') to get invalidation cause > > > of a replication slot. > > > > Thanks! +1 for the idea to display this information through an SQL API. > > > > I wonder if we could add a new field to pg_replication_slots instead > > of creating a new function. > > > > Yeah, that is another option. We already have a boolean column > 'conflicting' in pg_replication_slots, so are you suggesting adding a > new column like 'conflict_cause'? I feel it is okay to have an SQL > function for this as it may be primarily used for internal purposes or > for some specific users who want to dig deeper for the invalidation > cause. > > > > One of the use case scenarios for this function is another ongoing > > > thread "Synchronizing slots from primary to standby" at [1]. This > > > function is needed there to replicate invalidation-cause of a logical > > > replication slot from the primary server to the hot standby. But this > > > is an independent requirement in itself and thus makes sense to have > > > it implemented separately. > > > > Agree. > > > > My thoughts about it: > > > > 1 === > > > > "Currently, users do not have a way to know the invalidation cause" > > > > I'm not sure about it, I think one could see the reasons in the log file. > > > > 2 === > > > > "This function returns NULL if the replication slot is not found" > > > > Why not returning an error instead? (like pg_drop_replication_slot() does > > for example) > > > > +1. Also, the check for NULL argument should be there. If arg is NULL, it will not come to this function call. It comes to this function if the signature matches. That is why other functions like pg_drop_replication_slot(), pg_replication_slot_advance() etc do not have NULL arg check as well. > > > FWIW, we'd not need to cover this case if the description would be added to > > pg_replication_slots. > > > > 3 === > > > > + > > + 3 = wal_level insufficient for the slot > > + > > > > wal_level insufficient on the primary maybe? > > > > 4 === > > > > + * Returns ReplicationSlotInvalidationCause enum value for valid slot_name; > > > > I think it would be more user friendly to return a description instead of > > an enum value. > > > > But it would be tricky to use at a place where we want to know the > enum value as in the case of the sync slots patch where we want to > copy the cause. I think it would be better to display the description > if we want to display it in the view. > > -- > With Regards, > Amit Kapila. PFA v2 patch. Addressed below comments: 1) Added test in 019_replslot_limit.pl 2) 'pg_get_slot_invalidation_cause' now returns error if the given slot does not exist 3) Corrected doc and commit msg. thanks Shveta v2-0001-Function-to-get-invalidation-cause-of-a-replicati.patch Description: Binary data
Re: Function to get invalidation cause of a replication slot.
On Wed, Dec 20, 2023 at 12:46 PM Drouvot, Bertrand wrote: > > On 12/20/23 7:01 AM, shveta malik wrote: > > Hello hackers, > > > > Attached is a patch which attempts to implement a new system function > > pg_get_slot_invalidation_cause('slot_name') to get invalidation cause > > of a replication slot. > > Thanks! +1 for the idea to display this information through an SQL API. > > I wonder if we could add a new field to pg_replication_slots instead > of creating a new function. > Yeah, that is another option. We already have a boolean column 'conflicting' in pg_replication_slots, so are you suggesting adding a new column like 'conflict_cause'? I feel it is okay to have an SQL function for this as it may be primarily used for internal purposes or for some specific users who want to dig deeper for the invalidation cause. > > One of the use case scenarios for this function is another ongoing > > thread "Synchronizing slots from primary to standby" at [1]. This > > function is needed there to replicate invalidation-cause of a logical > > replication slot from the primary server to the hot standby. But this > > is an independent requirement in itself and thus makes sense to have > > it implemented separately. > > Agree. > > My thoughts about it: > > 1 === > > "Currently, users do not have a way to know the invalidation cause" > > I'm not sure about it, I think one could see the reasons in the log file. > > 2 === > > "This function returns NULL if the replication slot is not found" > > Why not returning an error instead? (like pg_drop_replication_slot() does for > example) > +1. Also, the check for NULL argument should be there. > FWIW, we'd not need to cover this case if the description would be added to > pg_replication_slots. > > 3 === > > + > + 3 = wal_level insufficient for the slot > + > > wal_level insufficient on the primary maybe? > > 4 === > > + * Returns ReplicationSlotInvalidationCause enum value for valid slot_name; > > I think it would be more user friendly to return a description instead of an > enum value. > But it would be tricky to use at a place where we want to know the enum value as in the case of the sync slots patch where we want to copy the cause. I think it would be better to display the description if we want to display it in the view. -- With Regards, Amit Kapila.
Re: Function to get invalidation cause of a replication slot.
Hi, On 12/20/23 7:01 AM, shveta malik wrote: Hello hackers, Attached is a patch which attempts to implement a new system function pg_get_slot_invalidation_cause('slot_name') to get invalidation cause of a replication slot. Thanks! +1 for the idea to display this information through an SQL API. I wonder if we could add a new field to pg_replication_slots instead of creating a new function. One of the use case scenarios for this function is another ongoing thread "Synchronizing slots from primary to standby" at [1]. This function is needed there to replicate invalidation-cause of a logical replication slot from the primary server to the hot standby. But this is an independent requirement in itself and thus makes sense to have it implemented separately. Agree. My thoughts about it: 1 === "Currently, users do not have a way to know the invalidation cause" I'm not sure about it, I think one could see the reasons in the log file. 2 === "This function returns NULL if the replication slot is not found" Why not returning an error instead? (like pg_drop_replication_slot() does for example) FWIW, we'd not need to cover this case if the description would be added to pg_replication_slots. 3 === + + 3 = wal_level insufficient for the slot + wal_level insufficient on the primary maybe? 4 === + * Returns ReplicationSlotInvalidationCause enum value for valid slot_name; I think it would be more user friendly to return a description instead of an enum value. 5 === doc/src/sgml/func.sgml | 33 + src/backend/replication/slotfuncs.c | 27 +++ src/include/catalog/pg_proc.dat | 4 Worth to add some coverage in 019_replslot_limit.pl and 035_standby_logical_decoding.pl? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Function to get invalidation cause of a replication slot.
Hello hackers, Attached is a patch which attempts to implement a new system function pg_get_slot_invalidation_cause('slot_name') to get invalidation cause of a replication slot. Currently, users do not have a way to know the invalidation cause. One can only find out if the slot is invalidated or not by querying the 'conflicting' field of pg_replication_slots. This new function provides a way to query invalidation cause as well. One of the use case scenarios for this function is another ongoing thread "Synchronizing slots from primary to standby" at [1]. This function is needed there to replicate invalidation-cause of a logical replication slot from the primary server to the hot standby. But this is an independent requirement in itself and thus makes sense to have it implemented separately. Please review and provide your feedback. [1]: https://www.postgresql.org/message-id/514f6f2f-6833-4539-39f1-96cd1e011f23%40enterprisedb.com thanks Shveta v1-0001-Function-to-get-invalidation-cause-of-a-replicati.patch Description: Binary data