Re: Is Recovery actually paused?
On Fri, 12 Mar 2021 at 2:04 AM, Robert Haas wrote: > On Mon, Mar 1, 2021 at 12:08 AM Dilip Kumar wrote: > > > > > I'll wait for a day before marking this RfC in case anyone have > > > > > further comments. > > > > > > > > Okay. > > > > > > Hearing nothing, done that. > > > > Thanks. > > Committed with minor cosmetic changes. Thanks Robert. > -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Mon, Mar 1, 2021 at 12:08 AM Dilip Kumar wrote: > > > > I'll wait for a day before marking this RfC in case anyone have > > > > further comments. > > > > > > Okay. > > > > Hearing nothing, done that. > > Thanks. Committed with minor cosmetic changes. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Fri, Feb 26, 2021 at 1:33 PM Kyotaro Horiguchi wrote: > > At Thu, 25 Feb 2021 13:22:53 +0530, Dilip Kumar wrote > in > > On Thu, Feb 25, 2021 at 12:42 PM Kyotaro Horiguchi > > wrote: > > > The latest version applies (almost) cleanly to the current master and > > > works fine. > > > I don't have further comment on this. > > > > > > I'll wait for a day before marking this RfC in case anyone have > > > further comments. > > > > Okay. > > Hearing nothing, done that. Thanks. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
At Thu, 25 Feb 2021 13:22:53 +0530, Dilip Kumar wrote in > On Thu, Feb 25, 2021 at 12:42 PM Kyotaro Horiguchi > wrote: > > The latest version applies (almost) cleanly to the current master and > > works fine. > > I don't have further comment on this. > > > > I'll wait for a day before marking this RfC in case anyone have > > further comments. > > Okay. Hearing nothing, done that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is Recovery actually paused?
On Thu, Feb 25, 2021 at 12:42 PM Kyotaro Horiguchi wrote: > > Thanks for your patience and sorry for having annoyed you. Thank you very much for your review and inputs. > The latest version applies (almost) cleanly to the current master and > works fine. > I don't have further comment on this. > > I'll wait for a day before marking this RfC in case anyone have > further comments. Okay. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
At Thu, 25 Feb 2021 09:49:15 +0530, Dilip Kumar wrote in > On Thu, Feb 25, 2021 at 6:52 AM Kyotaro Horiguchi > wrote: > > > Recently we have mildly changed to the direction to utilize the > > compiler warning about enum coverage in switch struct. (Maybe we need > > another compiler option that enables that check for switch'es with the > > default case, though.) In that light, the direction is a switch > > without the default case then Assert if none of the cases is stepped > > on. This is what apply_dispatch does. Slightly different version of > > the same would be the following. This is more natural than the above. > > > > statestr = NULL; > > swtich(state) > > { > > case RECOVERY_NOT_PAUSED: > > statestr = "not paused"; > > break; > > ... > > } > > > > Assert (statestr != NULL); > > return cstring_to_text(statestr); > > > > If the enum had many (more than ten or so?) values and it didn't seem > > stable I push that a bit strongly but it actually consists of only > > three values and not likely to get further values. So I don't insist > > on the style so strongly here. > > > > Changed as per the suggestion. Thanks for your patience and sorry for having annoyed you. The latest version applies (almost) cleanly to the current master and works fine. I don't have further comment on this. I'll wait for a day before marking this RfC in case anyone have further comments. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is Recovery actually paused?
On Thu, Feb 25, 2021 at 6:52 AM Kyotaro Horiguchi wrote: > Recently we have mildly changed to the direction to utilize the > compiler warning about enum coverage in switch struct. (Maybe we need > another compiler option that enables that check for switch'es with the > default case, though.) In that light, the direction is a switch > without the default case then Assert if none of the cases is stepped > on. This is what apply_dispatch does. Slightly different version of > the same would be the following. This is more natural than the above. > > statestr = NULL; > swtich(state) > { > case RECOVERY_NOT_PAUSED: > statestr = "not paused"; > break; > ... > } > > Assert (statestr != NULL); > return cstring_to_text(statestr); > > If the enum had many (more than ten or so?) values and it didn't seem > stable I push that a bit strongly but it actually consists of only > three values and not likely to get further values. So I don't insist > on the style so strongly here. > Changed as per the suggestion. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com From 3a5848b49a808ed5cde0f1d38bcaf0bba112cc54 Mon Sep 17 00:00:00 2001 From: Dilip Kumar Date: Wed, 27 Jan 2021 16:46:04 +0530 Subject: [PATCH v16] Provide a new interface to get the recovery pause status Currently, pg_is_wal_replay_paused, just checks whether the recovery pause is requested or not but it doesn't actually tell whether the recovery is actually paused or not. So basically there is no way for the user to know the actual status of the pause request. This patch provides a new interface pg_get_wal_replay_pause_state that will return the actual status of the recovery pause i.e.'not paused' if pause is not requested, 'pause requested' if pause is requested but recovery is not yet paused and 'paused' if recovery is actually paused. --- doc/src/sgml/func.sgml | 32 +++-- src/backend/access/transam/xlog.c | 86 -- src/backend/access/transam/xlogfuncs.c | 45 +- src/include/access/xlog.h | 10 +++- src/include/catalog/pg_proc.dat| 4 ++ 5 files changed, 155 insertions(+), 22 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index d822427..500bc9b 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25285,7 +25285,24 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); boolean -Returns true if recovery is paused. +Returns true if recovery pause is requested. + + + + + + + pg_get_wal_replay_pause_state + +pg_get_wal_replay_pause_state () +text + + +Returns recovery pause state. The return values are +not paused if pause is not requested, +pause requested if pause is requested but recovery is +not yet paused and, paused if the recovery is +actually paused. @@ -25324,10 +25341,15 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); void -Pauses recovery. While recovery is paused, no further database -changes are applied. If hot standby is active, all new queries will -see the same consistent snapshot of the database, and no further query -conflicts will be generated until recovery is resumed. +Request to pause recovery. A request doesn't mean that recovery stops +right away. If you want a guarantee that recovery is actually paused, +you need to check for the recovery pause state returned by +pg_get_wal_replay_pause_state(). Note that +pg_is_wal_replay_paused() returns whether a request +is made. While recovery is paused, no further database changes are applied. +If hot standby is active, all new queries will see the same consistent +snapshot of the database, and no further query conflicts will be generated +until recovery is resumed. This function is restricted to superusers by default, but other users diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index e0c37f7..5c94be7 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -721,8 +721,8 @@ typedef struct XLogCtlData * only relevant for replication or archive recovery */ TimestampTz currentChunkStartTime; - /* Are we requested to pause recovery? */ - bool recoveryPause; + /* Recovery pause state */ + RecoveryPauseState recoveryPauseState; /* * lastFpwDisableRecPtr points to the start of the last replayed @@ -894,6 +894,7 @@ static void validateRecoveryParameters(void); static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog); static bool recoveryStopsBefore(XLogReaderState *record); static bool recoveryStopsAfter(XLogReaderState *record); +s
Re: Is Recovery actually paused?
At Wed, 24 Feb 2021 15:25:57 +0530, Dilip Kumar wrote in > On Wed, Feb 24, 2021 at 2:26 PM Kyotaro Horiguchi > wrote: > > I should have took the meaning of "confirm" wrongly. I took that as > > "somehow determine if the recovery is to be paused". If that reading > > is completely wrong, I don't mind either re-chaging the function name > > or leaving all it alone. > > I am fine with leaving it the way it is unless someone feels that we > should change it. Understood. I don't stick to the change. > > > > This disables the static enum coverage check and it is not likely to > > > > have a wrong value here, other than the case of shared memory > > > > corruption. So we can remove the default case > > > > here. pg_get_replication_slots() is going that direction and > > > > apply_dispatch() is taking a slightly different way. Anyway I think > > > > that we can take away the default case. > > > > > > So do you think we should put an assert(0) in the default case? > > > > No. Just removing the default in the switch. If the value comes from > > some other source typically from disk or user-interraction, the > > default is necessary, but, in the first place if we have other than > > the defined value there, it is a sign of something worse than > > ERROR. If we care about that case, we *could* do the same thing with > > apply_dispatch(). > > > > switch (GetRecoveryPauseState()) > > { > > case RECOVERY_NOT_PAUSED: > > return cstring_to_text("not paused"); > > .. > >} > > > >/* we shouldn't reach here */ > >Assert (0); > > I think for such cases IMHO the preferred style for PostgreSQL is that > we add Assert(0) in the default case, at least it appeared to me that > way. Recently we have mildly changed to the direction to utilize the compiler warning about enum coverage in switch struct. (Maybe we need another compiler option that enables that check for switch'es with the default case, though.) In that light, the direction is a switch without the default case then Assert if none of the cases is stepped on. This is what apply_dispatch does. Slightly different version of the same would be the following. This is more natural than the above. statestr = NULL; swtich(state) { case RECOVERY_NOT_PAUSED: statestr = "not paused"; break; ... } Assert (statestr != NULL); return cstring_to_text(statestr); If the enum had many (more than ten or so?) values and it didn't seem stable I push that a bit strongly but it actually consists of only three values and not likely to get further values. So I don't insist on the style so strongly here. In short, I'm fine with default: Assert(0) if you still don't like the just above. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is Recovery actually paused?
On Wed, Feb 24, 2021 at 3:25 PM Dilip Kumar wrote: > > > > > The reason for the checkpoint is to move to "paused" state in a > > > > reasonable time. I think we need to mention that reason rather than > > > > what is done here. > > > > > > I will do that. I have fixed this. > > > > > > > > + /* get the recovery pause state */ > > > > + switch(GetRecoveryPauseState()) > > > > + { > > > > + case RECOVERY_NOT_PAUSED: > > > > + state = "not paused"; > > > > + break; > > > > ... > > > > + default: > > > > + elog(ERROR, "invalid recovery pause state"); > > I think for such cases IMHO the preferred style for PostgreSQL is that > we add Assert(0) in the default case, at least it appeared to me that > way. Added an Assert(0) in default case. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com From 5eef4c3d7e2aa3c2858031566f2da3b0075e361a Mon Sep 17 00:00:00 2001 From: Dilip Kumar Date: Wed, 27 Jan 2021 16:46:04 +0530 Subject: [PATCH v15] Provide a new interface to get the recovery pause status Currently, pg_is_wal_replay_paused, just checks whether the recovery pause is requested or not but it doesn't actually tell whether the recovery is actually paused or not. So basically there is no way for the user to know the actual status of the pause request. This patch provides a new interface pg_get_wal_replay_pause_state that will return the actual status of the recovery pause i.e.'not paused' if pause is not requested, 'pause requested' if pause is requested but recovery is not yet paused and 'paused' if recovery is actually paused. --- doc/src/sgml/func.sgml | 32 +++-- src/backend/access/transam/xlog.c | 86 -- src/backend/access/transam/xlogfuncs.c | 47 ++- src/include/access/xlog.h | 10 +++- src/include/catalog/pg_proc.dat| 4 ++ 5 files changed, 157 insertions(+), 22 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index d822427..500bc9b 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25285,7 +25285,24 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); boolean -Returns true if recovery is paused. +Returns true if recovery pause is requested. + + + + + + + pg_get_wal_replay_pause_state + +pg_get_wal_replay_pause_state () +text + + +Returns recovery pause state. The return values are +not paused if pause is not requested, +pause requested if pause is requested but recovery is +not yet paused and, paused if the recovery is +actually paused. @@ -25324,10 +25341,15 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); void -Pauses recovery. While recovery is paused, no further database -changes are applied. If hot standby is active, all new queries will -see the same consistent snapshot of the database, and no further query -conflicts will be generated until recovery is resumed. +Request to pause recovery. A request doesn't mean that recovery stops +right away. If you want a guarantee that recovery is actually paused, +you need to check for the recovery pause state returned by +pg_get_wal_replay_pause_state(). Note that +pg_is_wal_replay_paused() returns whether a request +is made. While recovery is paused, no further database changes are applied. +If hot standby is active, all new queries will see the same consistent +snapshot of the database, and no further query conflicts will be generated +until recovery is resumed. This function is restricted to superusers by default, but other users diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index e0c37f7..5c94be7 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -721,8 +721,8 @@ typedef struct XLogCtlData * only relevant for replication or archive recovery */ TimestampTz currentChunkStartTime; - /* Are we requested to pause recovery? */ - bool recoveryPause; + /* Recovery pause state */ + RecoveryPauseState recoveryPauseState; /* * lastFpwDisableRecPtr points to the start of the last replayed @@ -894,6 +894,7 @@ static void validateRecoveryParameters(void); static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog); static bool recoveryStopsBefore(XLogReaderState *record); static bool recoveryStopsAfter(XLogReaderState *record); +static void ConfirmRecoveryPaused(void); static void recoveryPausesHere(bool endOfRecovery); static bool recoveryApplyDelay(XLogReaderState *record); static void SetLatestXTime(Tim
Re: Is Recovery actually paused?
On Wed, Feb 24, 2021 at 2:26 PM Kyotaro Horiguchi wrote: > > At Wed, 24 Feb 2021 13:15:27 +0530, Dilip Kumar wrote > in > > On Wed, Feb 24, 2021 at 12:39 PM Kyotaro Horiguchi > > wrote: > > > > > > At Tue, 23 Feb 2021 12:03:32 +0530, Dilip Kumar > > > wrote in > > > > On Fri, Feb 12, 2021 at 3:26 AM Robert Haas > > > > wrote: > > > How about something like this? > > > > > > Request to pause recovery. Server actually stops recovery at a > > > convenient time. This can take a few seconds after the request. If you > > > need to strictly guarantee that no further database change will occur, > > > you can check using pg_get_wal_replay_ause_state(). Note that > > > pg_is_wal_replay_paused() may return true before recovery actually > > > stops. > > > > I still think that for the user-facing documentation purpose the > > current paragraph looks better. > > Ok. > > > > The patch adds two loops whth the following logic: > > > > > > while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED) > > > { > > >... > > >ConfirmRecoveryPaused(); > > > > > > } > > > > > > After the renaming of the function, the following structure looks > > > simpler and more natural. > > > > > > while (ConfirmRecoveryPaused()) > > > { > > >... > > > > > > } > > > > So do you mean that if the pause is requested ConfirmRecoveryPaused > > will set it to paused and if it is not paused then it will return > > false? With the current function name, I don't think that will look > > clean maybe we should change the name to something like > > CheckAndConfirmRecoveryPaused? Or I am fine with the way it is now. > > Any other thoughts? > > I should have took the meaning of "confirm" wrongly. I took that as > "somehow determine if the recovery is to be paused". If that reading > is completely wrong, I don't mind either re-chaging the function name > or leaving all it alone. I am fine with leaving it the way it is unless someone feels that we should change it. > > > > > > + /* test for recovery pause, if user has requested the > > > pause */ > > > + if (((volatile XLogCtlData *) > > > XLogCtl)->recoveryPauseState != > > > > > > The reason for the checkpoint is to move to "paused" state in a > > > reasonable time. I think we need to mention that reason rather than > > > what is done here. > > > > I will do that. > > Thanks. > > > > > > > + /* get the recovery pause state */ > > > + switch(GetRecoveryPauseState()) > > > + { > > > + case RECOVERY_NOT_PAUSED: > > > + state = "not paused"; > > > + break; > > > ... > > > + default: > > > + elog(ERROR, "invalid recovery pause state"); > > > > > > This disables the static enum coverage check and it is not likely to > > > have a wrong value here, other than the case of shared memory > > > corruption. So we can remove the default case > > > here. pg_get_replication_slots() is going that direction and > > > apply_dispatch() is taking a slightly different way. Anyway I think > > > that we can take away the default case. > > > > So do you think we should put an assert(0) in the default case? > > No. Just removing the default in the switch. If the value comes from > some other source typically from disk or user-interraction, the > default is necessary, but, in the first place if we have other than > the defined value there, it is a sign of something worse than > ERROR. If we care about that case, we *could* do the same thing with > apply_dispatch(). > > switch (GetRecoveryPauseState()) > { > case RECOVERY_NOT_PAUSED: > return cstring_to_text("not paused"); > .. >} > >/* we shouldn't reach here */ >Assert (0); I think for such cases IMHO the preferred style for PostgreSQL is that we add Assert(0) in the default case, at least it appeared to me that way. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
At Wed, 24 Feb 2021 17:56:41 +0900 (JST), Kyotaro Horiguchi wrote in > At Wed, 24 Feb 2021 13:15:27 +0530, Dilip Kumar wrote > in > > > After the renaming of the function, the following structure looks > > > simpler and more natural. > > > > > > while (ConfirmRecoveryPaused()) > > > { > > >... > > > > > > } > > > > So do you mean that if the pause is requested ConfirmRecoveryPaused > > will set it to paused and if it is not paused then it will return > > false? With the current function name, I don't think that will look > > clean maybe we should change the name to something like > > CheckAndConfirmRecoveryPaused? Or I am fine with the way it is now. > > Any other thoughts? > > I should have took the meaning of "confirm" wrongly. I took that as > "somehow determine if the recovery is to be paused". If that reading > is completely wrong, I don't mind either re-chaging the function name > or leaving all it alone. Ouch. If we choose to re-rename it, it won't be "CheckAnd...". RecoveryIsPaused() is used for another meaning. Maybe RecoveryPauseTriggerd() or such? (I'm not sure, sorry..) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is Recovery actually paused?
At Wed, 24 Feb 2021 13:15:27 +0530, Dilip Kumar wrote in > On Wed, Feb 24, 2021 at 12:39 PM Kyotaro Horiguchi > wrote: > > > > At Tue, 23 Feb 2021 12:03:32 +0530, Dilip Kumar > > wrote in > > > On Fri, Feb 12, 2021 at 3:26 AM Robert Haas wrote: > > How about something like this? > > > > Request to pause recovery. Server actually stops recovery at a > > convenient time. This can take a few seconds after the request. If you > > need to strictly guarantee that no further database change will occur, > > you can check using pg_get_wal_replay_ause_state(). Note that > > pg_is_wal_replay_paused() may return true before recovery actually > > stops. > > I still think that for the user-facing documentation purpose the > current paragraph looks better. Ok. > > The patch adds two loops whth the following logic: > > > > while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED) > > { > >... > >ConfirmRecoveryPaused(); > > > > } > > > > After the renaming of the function, the following structure looks > > simpler and more natural. > > > > while (ConfirmRecoveryPaused()) > > { > >... > > > > } > > So do you mean that if the pause is requested ConfirmRecoveryPaused > will set it to paused and if it is not paused then it will return > false? With the current function name, I don't think that will look > clean maybe we should change the name to something like > CheckAndConfirmRecoveryPaused? Or I am fine with the way it is now. > Any other thoughts? I should have took the meaning of "confirm" wrongly. I took that as "somehow determine if the recovery is to be paused". If that reading is completely wrong, I don't mind either re-chaging the function name or leaving all it alone. > > > > + /* test for recovery pause, if user has requested the pause > > */ > > + if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState > > != > > > > The reason for the checkpoint is to move to "paused" state in a > > reasonable time. I think we need to mention that reason rather than > > what is done here. > > I will do that. Thanks. > > > > + /* get the recovery pause state */ > > + switch(GetRecoveryPauseState()) > > + { > > + case RECOVERY_NOT_PAUSED: > > + state = "not paused"; > > + break; > > ... > > + default: > > + elog(ERROR, "invalid recovery pause state"); > > > > This disables the static enum coverage check and it is not likely to > > have a wrong value here, other than the case of shared memory > > corruption. So we can remove the default case > > here. pg_get_replication_slots() is going that direction and > > apply_dispatch() is taking a slightly different way. Anyway I think > > that we can take away the default case. > > So do you think we should put an assert(0) in the default case? No. Just removing the default in the switch. If the value comes from some other source typically from disk or user-interraction, the default is necessary, but, in the first place if we have other than the defined value there, it is a sign of something worse than ERROR. If we care about that case, we *could* do the same thing with apply_dispatch(). switch (GetRecoveryPauseState()) { case RECOVERY_NOT_PAUSED: return cstring_to_text("not paused"); .. } /* we shouldn't reach here */ Assert (0); regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is Recovery actually paused?
On Wed, Feb 24, 2021 at 12:39 PM Kyotaro Horiguchi wrote: > > At Tue, 23 Feb 2021 12:03:32 +0530, Dilip Kumar wrote > in > > On Fri, Feb 12, 2021 at 3:26 AM Robert Haas wrote: > > > There might be some more to say here, but those are things I notice on > > > a first read-through. > > > > Okay. > > It seems to me all the suggestions are addressed in this version. > > +Request to pause recovery. A request doesn't mean that recovery > stops > +right away. If you want a guarantee that recovery is actually > paused, > +you need to check for the recovery pause state returned by > +pg_get_wal_replay_pause_state(). Note that > +pg_is_wal_replay_paused() returns whether a > request > +is made. While recovery is paused, no further database changes are > applied. > > This looks like explainig the same thing twice. ("A request doesn't > mean.." and "While recovery is paused, ...") > > How about something like this? > > Request to pause recovery. Server actually stops recovery at a > convenient time. This can take a few seconds after the request. If you > need to strictly guarantee that no further database change will occur, > you can check using pg_get_wal_replay_ause_state(). Note that > pg_is_wal_replay_paused() may return true before recovery actually > stops. I still think that for the user-facing documentation purpose the current paragraph looks better. > The patch adds two loops whth the following logic: > > while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED) > { >... >ConfirmRecoveryPaused(); > > } > > After the renaming of the function, the following structure looks > simpler and more natural. > > while (ConfirmRecoveryPaused()) > { >... > > } So do you mean that if the pause is requested ConfirmRecoveryPaused will set it to paused and if it is not paused then it will return false? With the current function name, I don't think that will look clean maybe we should change the name to something like CheckAndConfirmRecoveryPaused? Or I am fine with the way it is now. Any other thoughts? > > + /* test for recovery pause, if user has requested the pause */ > + if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState != > > The reason for the checkpoint is to move to "paused" state in a > reasonable time. I think we need to mention that reason rather than > what is done here. I will do that. > > + /* get the recovery pause state */ > + switch(GetRecoveryPauseState()) > + { > + case RECOVERY_NOT_PAUSED: > + state = "not paused"; > + break; > ... > + default: > + elog(ERROR, "invalid recovery pause state"); > > This disables the static enum coverage check and it is not likely to > have a wrong value here, other than the case of shared memory > corruption. So we can remove the default case > here. pg_get_replication_slots() is going that direction and > apply_dispatch() is taking a slightly different way. Anyway I think > that we can take away the default case. So do you think we should put an assert(0) in the default case? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
At Tue, 23 Feb 2021 12:03:32 +0530, Dilip Kumar wrote in > On Fri, Feb 12, 2021 at 3:26 AM Robert Haas wrote: > > There might be some more to say here, but those are things I notice on > > a first read-through. > > Okay. It seems to me all the suggestions are addressed in this version. +Request to pause recovery. A request doesn't mean that recovery stops +right away. If you want a guarantee that recovery is actually paused, +you need to check for the recovery pause state returned by +pg_get_wal_replay_pause_state(). Note that +pg_is_wal_replay_paused() returns whether a request +is made. While recovery is paused, no further database changes are applied. This looks like explainig the same thing twice. ("A request doesn't mean.." and "While recovery is paused, ...") How about something like this? Request to pause recovery. Server actually stops recovery at a convenient time. This can take a few seconds after the request. If you need to strictly guarantee that no further database change will occur, you can check using pg_get_wal_replay_ause_state(). Note that pg_is_wal_replay_paused() may return true before recovery actually stops. The patch adds two loops whth the following logic: while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED) { ... ConfirmRecoveryPaused(); } After the renaming of the function, the following structure looks simpler and more natural. while (ConfirmRecoveryPaused()) { ... } + /* test for recovery pause, if user has requested the pause */ + if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState != The reason for the checkpoint is to move to "paused" state in a reasonable time. I think we need to mention that reason rather than what is done here. + /* get the recovery pause state */ + switch(GetRecoveryPauseState()) + { + case RECOVERY_NOT_PAUSED: + state = "not paused"; + break; ... + default: + elog(ERROR, "invalid recovery pause state"); This disables the static enum coverage check and it is not likely to have a wrong value here, other than the case of shared memory corruption. So we can remove the default case here. pg_get_replication_slots() is going that direction and apply_dispatch() is taking a slightly different way. Anyway I think that we can take away the default case. regard. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is Recovery actually paused?
On Fri, Feb 12, 2021 at 3:26 AM Robert Haas wrote: > > On Thu, Feb 11, 2021 at 6:07 AM Dilip Kumar wrote: > > > Thanks for the patch. I tested the new function and it works as > > > expected. I have no further comments on the v13 patch. > > > > Thanks for the review and testing. > > I don't see a whole lot wrong with this patch, but I think there are > some things that could make it a little clearer: Thanks for the review > - I suggest renaming CheckAndSetRecoveryPause() to ConfirmRecoveryPaused(). Yeah that make more sense so changed. > - I suggest moving the definition of that function to just after > SetRecoveryPause(). Done > - I suggest changing the argument to SetRecoveryPause() back to bool. > In the one place where you call SetRecoveryPause(RECOVERY_PAUSED), > just call SetRecoveryPause(true) and ConfirmRecoveryPaused() back to > back. Yeah done that way, I think only in once place we were doing SetRecoveryPause(RECOVERY_PAUSED), but after putting more thought I think that was not required because right after setting that we are having the while loop under that we have to call ConfirmRecoveryPaused. So I have changed that also as SetRecoveryPause(true) without immediate call of ConfirmRecoveryPaused. This in turn means that the "if" statement in > SetRecoveryPaused() can be rewritten as if (!recoveryPaused) > XLogCtl->recoveryPauseState = RECOVERY_NOT_PAUSED else if > (XLogCtl->recoveryPauseState == RECOVERY_NOT_PAUSED) > XLogCtl->recoveryPauseState = RECOVERY_PAUSE_REQUESTED(). This is > slightly less efficient, but I don't think it matters, and I think it > will be a lot more clear what's the job of SetRecoveryPause (say > whether we're trying to pause or not) and what's the job of > ConfirmRecoveryPaused (say whether we've succeeded in pausing). Done > - Since the numeric values of RecoveryPauseState don't matter and the > values are never visible to anything outside the server nor stored on > disk, I would be inclined to (a) not specify particular values in > xlog.h and (b) remove the test-and-elog in SetRecoveryPause(). Done > - In the places where you say: > > -if (((volatile XLogCtlData *) XLogCtl)->recoveryPause) > +if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState == > +RECOVERY_PAUSE_REQUESTED) > > ...I would suggest instead testing for != RECOVERY_NOT_PAUSED. Perhaps > we don't think RECOVERY_PAUSED can happen here. But if somehow it did, > calling recoveryPausesHere() would be right. Done > There might be some more to say here, but those are things I notice on > a first read-through. Okay. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com From cf45d795946c0f33f7f29b533f175139e2a92583 Mon Sep 17 00:00:00 2001 From: Dilip Kumar Date: Wed, 27 Jan 2021 16:46:04 +0530 Subject: [PATCH v14] Provide a new interface to get the recovery pause status Currently, pg_is_wal_replay_paused, just checks whether the recovery pause is requested or not but it doesn't actually tell whether the recovery is actually paused or not. So basically there is no way for the user to know the actual status of the pause request. This patch provides a new interface pg_get_wal_replay_pause_state that will return the actual status of the recovery pause i.e.'not paused' if pause is not requested, 'pause requested' if pause is requested but recovery is not yet paused and 'paused' if recovery is actually paused. --- doc/src/sgml/func.sgml | 32 +++--- src/backend/access/transam/xlog.c | 81 -- src/backend/access/transam/xlogfuncs.c | 46 ++- src/include/access/xlog.h | 10 - src/include/catalog/pg_proc.dat| 4 ++ 5 files changed, 151 insertions(+), 22 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index d822427..500bc9b 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25285,7 +25285,24 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); boolean -Returns true if recovery is paused. +Returns true if recovery pause is requested. + + + + + + + pg_get_wal_replay_pause_state + +pg_get_wal_replay_pause_state () +text + + +Returns recovery pause state. The return values are +not paused if pause is not requested, +pause requested if pause is requested but recovery is +not yet paused and, paused if the recovery is +actually paused. @@ -25324,10 +25341,15 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); void -Pauses recovery. While recovery is paused, no further database -changes are applied. If hot standby is active, all new queries will -see the same consistent snapshot of the database, and no further query -
Re: Is Recovery actually paused?
At Fri, 12 Feb 2021 13:33:32 +0900, Yugo NAGATA wrote in > I don't think that we need to include the waiting approach in > pg_get_wal_replay_pause_state > patch. However, Horiguchi-san's patch may be useful for some users who want > pg_wal_replay_pause to wait until recovery gets paused instead of polling the > state from applications. So, I shink we could discuss this patch in another > thread as another commitfest entry independent from > pg_get_wal_replay_pause_state. Since what I'm proposing is not making pg_wal_replay_pause() to wait, and no one seems on my side, I withdraw the proposal. > I have no futher comments on the v13 patch, too. Also, I agree with > Robert Haas's suggestions. Yeah, look reasonable. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is Recovery actually paused?
On Thu, 11 Feb 2021 16:36:55 +0530 Dilip Kumar wrote: > On Thu, Feb 11, 2021 at 3:20 PM Bharath Rupireddy > wrote: > > > > On Wed, Feb 10, 2021 at 8:39 PM Dilip Kumar wrote: > > > > > > On Wed, Feb 10, 2021 at 10:02 AM Dilip Kumar > > > wrote: > > > > > > > > I don't find any problem with this approach as well, but I personally > > > > feel that the other approach where we don't wait in any API and just > > > > return the recovery pause state is much simpler and more flexible. So > > > > I will make the pending changes in that patch and let's see what are > > > > the other opinion and based on that we can conclude. Thanks for the > > > > patch. I don't think that we need to include the waiting approach in pg_get_wal_replay_pause_state patch. However, Horiguchi-san's patch may be useful for some users who want pg_wal_replay_pause to wait until recovery gets paused instead of polling the state from applications. So, I shink we could discuss this patch in another thread as another commitfest entry independent from pg_get_wal_replay_pause_state. > > > Here is an updated version of the patch which fixes the last two open > > > problems > > > 1. In RecoveryRequiresIntParameter set the recovery pause state in the > > > loop so that if recovery resumed and pause requested again we can set > > > to pause again. > > > 2. If the recovery state is already 'paused' then don't set it back to > > > the 'pause requested'. > > > > > > One more point is that in 'pg_wal_replay_pause' even if we don't > > > change the state because it was already set to the 'paused' then also > > > we call the WakeupRecovery. But I don't think there is any problem > > > with that, if we think that this should be changed then we can make > > > SetRecoveryPause return a bool such that if it doesn't do state change > > > then it returns false and in that case we can avoid calling > > > WakeupRecovery, but I felt that is unnecessary. Any other thoughts on > > > this? > > > > IMO, that WakeupRecovery should not be a problem, because even now, if > > we issue a simple select pg_reload_conf(); (without even changing any > > config parameter), WakeupRecovery gets called. > > > > Thanks for the patch. I tested the new function and it works as > > expected. I have no further comments on the v13 patch. > > Thanks for the review and testing. I have no futher comments on the v13 patch, too. Also, I agree with Robert Haas's suggestions. Regards, Yugo Nagata -- Yugo NAGATA
Re: Is Recovery actually paused?
On Thu, Feb 11, 2021 at 6:07 AM Dilip Kumar wrote: > > Thanks for the patch. I tested the new function and it works as > > expected. I have no further comments on the v13 patch. > > Thanks for the review and testing. I don't see a whole lot wrong with this patch, but I think there are some things that could make it a little clearer: - I suggest renaming CheckAndSetRecoveryPause() to ConfirmRecoveryPaused(). - I suggest moving the definition of that function to just after SetRecoveryPause(). - I suggest changing the argument to SetRecoveryPause() back to bool. In the one place where you call SetRecoveryPause(RECOVERY_PAUSED), just call SetRecoveryPause(true) and ConfirmRecoveryPaused() back to back. This in turn means that the "if" statement in SetRecoveryPaused() can be rewritten as if (!recoveryPaused) XLogCtl->recoveryPauseState = RECOVERY_NOT_PAUSED else if (XLogCtl->recoveryPauseState == RECOVERY_NOT_PAUSED) XLogCtl->recoveryPauseState = RECOVERY_PAUSE_REQUESTED(). This is slightly less efficient, but I don't think it matters, and I think it will be a lot more clear what's the job of SetRecoveryPause (say whether we're trying to pause or not) and what's the job of ConfirmRecoveryPaused (say whether we've succeeded in pausing). - Since the numeric values of RecoveryPauseState don't matter and the values are never visible to anything outside the server nor stored on disk, I would be inclined to (a) not specify particular values in xlog.h and (b) remove the test-and-elog in SetRecoveryPause(). - In the places where you say: -if (((volatile XLogCtlData *) XLogCtl)->recoveryPause) +if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState == +RECOVERY_PAUSE_REQUESTED) ...I would suggest instead testing for != RECOVERY_NOT_PAUSED. Perhaps we don't think RECOVERY_PAUSED can happen here. But if somehow it did, calling recoveryPausesHere() would be right. There might be some more to say here, but those are things I notice on a first read-through. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Thu, Feb 11, 2021 at 3:20 PM Bharath Rupireddy wrote: > > On Wed, Feb 10, 2021 at 8:39 PM Dilip Kumar wrote: > > > > On Wed, Feb 10, 2021 at 10:02 AM Dilip Kumar wrote: > > > > > > I don't find any problem with this approach as well, but I personally > > > feel that the other approach where we don't wait in any API and just > > > return the recovery pause state is much simpler and more flexible. So > > > I will make the pending changes in that patch and let's see what are > > > the other opinion and based on that we can conclude. Thanks for the > > > patch. > > > > Here is an updated version of the patch which fixes the last two open > > problems > > 1. In RecoveryRequiresIntParameter set the recovery pause state in the > > loop so that if recovery resumed and pause requested again we can set > > to pause again. > > 2. If the recovery state is already 'paused' then don't set it back to > > the 'pause requested'. > > > > One more point is that in 'pg_wal_replay_pause' even if we don't > > change the state because it was already set to the 'paused' then also > > we call the WakeupRecovery. But I don't think there is any problem > > with that, if we think that this should be changed then we can make > > SetRecoveryPause return a bool such that if it doesn't do state change > > then it returns false and in that case we can avoid calling > > WakeupRecovery, but I felt that is unnecessary. Any other thoughts on > > this? > > IMO, that WakeupRecovery should not be a problem, because even now, if > we issue a simple select pg_reload_conf(); (without even changing any > config parameter), WakeupRecovery gets called. > > Thanks for the patch. I tested the new function and it works as > expected. I have no further comments on the v13 patch. Thanks for the review and testing. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Wed, Feb 10, 2021 at 8:39 PM Dilip Kumar wrote: > > On Wed, Feb 10, 2021 at 10:02 AM Dilip Kumar wrote: > > > > I don't find any problem with this approach as well, but I personally > > feel that the other approach where we don't wait in any API and just > > return the recovery pause state is much simpler and more flexible. So > > I will make the pending changes in that patch and let's see what are > > the other opinion and based on that we can conclude. Thanks for the > > patch. > > Here is an updated version of the patch which fixes the last two open problems > 1. In RecoveryRequiresIntParameter set the recovery pause state in the > loop so that if recovery resumed and pause requested again we can set > to pause again. > 2. If the recovery state is already 'paused' then don't set it back to > the 'pause requested'. > > One more point is that in 'pg_wal_replay_pause' even if we don't > change the state because it was already set to the 'paused' then also > we call the WakeupRecovery. But I don't think there is any problem > with that, if we think that this should be changed then we can make > SetRecoveryPause return a bool such that if it doesn't do state change > then it returns false and in that case we can avoid calling > WakeupRecovery, but I felt that is unnecessary. Any other thoughts on > this? IMO, that WakeupRecovery should not be a problem, because even now, if we issue a simple select pg_reload_conf(); (without even changing any config parameter), WakeupRecovery gets called. Thanks for the patch. I tested the new function and it works as expected. I have no further comments on the v13 patch. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Wed, Feb 10, 2021 at 10:02 AM Dilip Kumar wrote: > > I don't find any problem with this approach as well, but I personally > feel that the other approach where we don't wait in any API and just > return the recovery pause state is much simpler and more flexible. So > I will make the pending changes in that patch and let's see what are > the other opinion and based on that we can conclude. Thanks for the > patch. Here is an updated version of the patch which fixes the last two open problems 1. In RecoveryRequiresIntParameter set the recovery pause state in the loop so that if recovery resumed and pause requested again we can set to pause again. 2. If the recovery state is already 'paused' then don't set it back to the 'pause requested'. One more point is that in 'pg_wal_replay_pause' even if we don't change the state because it was already set to the 'paused' then also we call the WakeupRecovery. But I don't think there is any problem with that, if we think that this should be changed then we can make SetRecoveryPause return a bool such that if it doesn't do state change then it returns false and in that case we can avoid calling WakeupRecovery, but I felt that is unnecessary. Any other thoughts on this? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com From 4f7f3e11c3c6354225ddb80bb06b503e940cd8ff Mon Sep 17 00:00:00 2001 From: Dilip Kumar Date: Wed, 27 Jan 2021 16:46:04 +0530 Subject: [PATCH v13] Provide a new interface to get the recovery pause status Currently, pg_is_wal_replay_paused, just checks whether the recovery pause is requested or not but it doesn't actually tell whether the recovery is actually paused or not. So basically there is no way for the user to know the actual status of the pause request. This patch provides a new interface pg_get_wal_replay_pause_state that will return the actual status of the recovery pause i.e.'not paused' if pause is not requested, 'pause requested' if pause is requested but recovery is not yet paused and 'paused' if recovery is actually paused. --- doc/src/sgml/func.sgml | 32 +++-- src/backend/access/transam/xlog.c | 86 +++--- src/backend/access/transam/xlogfuncs.c | 50 ++-- src/include/access/xlog.h | 12 - src/include/catalog/pg_proc.dat| 4 ++ 5 files changed, 156 insertions(+), 28 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 1ab31a9..9b2c429 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25285,7 +25285,24 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); boolean -Returns true if recovery is paused. +Returns true if recovery pause is requested. + + + + + + + pg_get_wal_replay_pause_state + +pg_get_wal_replay_pause_state () +text + + +Returns recovery pause state. The return values are +not paused if pause is not requested, +pause requested if pause is requested but recovery is +not yet paused and, paused if the recovery is +actually paused. @@ -25324,10 +25341,15 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); void -Pauses recovery. While recovery is paused, no further database -changes are applied. If hot standby is active, all new queries will -see the same consistent snapshot of the database, and no further query -conflicts will be generated until recovery is resumed. +Request to pause recovery. A request doesn't mean that recovery stops +right away. If you want a guarantee that recovery is actually paused, +you need to check for the recovery pause state returned by +pg_get_wal_replay_pause_state(). Note that +pg_is_wal_replay_paused() returns whether a request +is made. While recovery is paused, no further database changes are applied. +If hot standby is active, all new queries will see the same consistent +snapshot of the database, and no further query conflicts will be generated +until recovery is resumed. This function is restricted to superusers by default, but other users diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 8e3b5df..459a19b 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -721,8 +721,8 @@ typedef struct XLogCtlData * only relevant for replication or archive recovery */ TimestampTz currentChunkStartTime; - /* Are we requested to pause recovery? */ - bool recoveryPause; + /* Recovery pause state */ + RecoveryPauseState recoveryPauseState; /* * lastFpwDisableRecPtr points to the start of the last replayed @@ -894,6 +894,7 @@ static void validateRecoveryParameters(void); st
Re: Is Recovery actually paused?
On Wed, Feb 10, 2021 at 8:19 AM Kyotaro Horiguchi wrote: > > At Tue, 9 Feb 2021 12:27:21 +0530, Bharath Rupireddy > wrote in > > What I meant was that if we were to add waiting logic inside > > pg_wal_replay_pause, we should also have a timeout with some default > > value, to avoid pg_wal_replay_pause waiting forever in the waiting > > loop. Within that timeout, if the recovery isn't paused, > > pg_wal_replay_pause will return probably a warning and a false(this > > requires us to change the return value of the existing > > pg_wal_replay_pause)? > > I thought that rm_redo finishes shortly unless any trouble > happens. But on second thought, I found that I forgot a case of a > recovery-conflict. So as you pointed out, pg_wal_replay_pause() needs > a flag 'wait' to wait for a pause established. And the flag can be > turned into "timeout". > > # And the prevous verision had another silly bug. > > > To avoid changing the existing API and return type, a new function > > pg_get_wal_replay_pause_state is introduced. > > I mentioned about IN parameters, not OUTs. IN parameters can be > optional to accept existing usage. pg_wal_replay_pause() is changed > that way in the attached. > > If all of you still disagree with my proposal, I withdraw it. I don't find any problem with this approach as well, but I personally feel that the other approach where we don't wait in any API and just return the recovery pause state is much simpler and more flexible. So I will make the pending changes in that patch and let's see what are the other opinion and based on that we can conclude. Thanks for the patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
At Tue, 9 Feb 2021 12:27:21 +0530, Bharath Rupireddy wrote in > What I meant was that if we were to add waiting logic inside > pg_wal_replay_pause, we should also have a timeout with some default > value, to avoid pg_wal_replay_pause waiting forever in the waiting > loop. Within that timeout, if the recovery isn't paused, > pg_wal_replay_pause will return probably a warning and a false(this > requires us to change the return value of the existing > pg_wal_replay_pause)? I thought that rm_redo finishes shortly unless any trouble happens. But on second thought, I found that I forgot a case of a recovery-conflict. So as you pointed out, pg_wal_replay_pause() needs a flag 'wait' to wait for a pause established. And the flag can be turned into "timeout". # And the prevous verision had another silly bug. > To avoid changing the existing API and return type, a new function > pg_get_wal_replay_pause_state is introduced. I mentioned about IN parameters, not OUTs. IN parameters can be optional to accept existing usage. pg_wal_replay_pause() is changed that way in the attached. If all of you still disagree with my proposal, I withdraw it. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 1ab31a9056..7eb93f74dd 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25320,14 +25320,19 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); pg_wal_replay_pause -pg_wal_replay_pause () +pg_wal_replay_pause ( + timeout integer + ) void Pauses recovery. While recovery is paused, no further database changes are applied. If hot standby is active, all new queries will see the same consistent snapshot of the database, and no further query -conflicts will be generated until recovery is resumed. +conflicts will be generated until recovery is resumed. Zero or +positive timeout value means the function errors out after that +milliseconds elapsed before recovery is paused (default is -1, wait +forever). This function is restricted to superusers by default, but other users diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 8e3b5df7dc..8fd614cded 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6072,10 +6072,61 @@ RecoveryIsPaused(void) return recoveryPause; } +/* + * Pauses recovery. + * + * It is guaranteed that no WAL replay happens after this function returns. If + * timeout is zero or positive, emits ERROR when the timeout is reached before + * recovery is paused. + */ void -SetRecoveryPause(bool recoveryPause) +SetRecoveryPause(bool recoveryPause, int timeout) { + TimestampTz finish_time = 0; + TimestampTz now; + int sleep_ms; + SpinLockAcquire(&XLogCtl->info_lck); + + /* No need of timeout in the startup process */ + Assert(!InRecovery || timeout < 0); + + /* +* Wait for the concurrent rm_redo() to finish, so that no records will be +* applied after this function returns. No need to wait while resuming. +* Anyway we are requesting a recovery pause, we don't mind a possible slow +* down of recovery by the info_lck here. We don't need to wait in the +* startup process since no concurrent rm_redo() runs. +*/ + while(!InRecovery && + recoveryPause && !XLogCtl->recoveryPause && + XLogCtl->replayEndRecPtr != XLogCtl->lastReplayedEndRecPtr) + { + SpinLockRelease(&XLogCtl->info_lck); + now = GetCurrentTimestamp(); + + if (timeout >= 0) + { + if (timeout > 0 && finish_time == 0) + finish_time = TimestampTzPlusMilliseconds(now, timeout); + + if (finish_time < now) + ereport(ERROR, + (errcode(ERRCODE_SQL_STATEMENT_NOT_YET_COMPLETE), +errmsg ("could not pause recovery: timed out"))); + } + + CHECK_FOR_INTERRUPTS(); + + sleep_ms = 1L; /* 10 ms */ + + /* finish_time may be reached earlier than 10ms */ + if (finish_time > 0) + Min(sleep_ms, TimestampDifferenceMilliseconds(now, finish_time)); + + pg_usleep(sleep_ms); + SpinLockAcquire(&XLogCtl->info_lck); + } XLogCtl->recoveryPause = recoveryPause; SpinLockRelease(&XLogCtl->info_lck); } @@ -6270,7 +6321,7 @@ RecoveryRequiresIntParameter(const char *param_name, int currValue, int minValue
Re: Is Recovery actually paused?
On Tue, Feb 9, 2021 at 11:30 AM Kyotaro Horiguchi wrote: > > At Tue, 9 Feb 2021 09:58:30 +0530, Bharath Rupireddy > wrote in > > On Tue, Feb 9, 2021 at 9:48 AM Dilip Kumar wrote: > > > > > > On Tue, Feb 9, 2021 at 8:54 AM Yugo NAGATA wrote: > > > > > > > > On Tue, 09 Feb 2021 10:58:04 +0900 (JST) > > > > Kyotaro Horiguchi wrote: > > > > > If we are going to introduce that complexity, I'd like to re-propose > > > > > to introduce interlocking between the recovery side and the > > > > > pause-requestor side instead of introducing the intermediate state, > > > > > which is the cause of the complexity. > > > > > > > > > > The attached PoC patch adds: > > > > > > > > > > - A solid checkpoint just before calling rm_redo. It doesn't add a > > > > > info_lck since the check is done in the existing lock section. > > > > > > > > > > - Interlocking between the above and SetRecoveryPause without adding a > > > > > shared variable. > > > > > (This is what I called "synchronous" before.) > > > > > > > > I think waiting in pg_wal_replay_pasue is a possible option, but this > > > > will > > > > also introduce other complexity to codes such as possibility of waiting > > > > for > > > > long or for ever. For example, waiting in SetRecoveryPause as in your > > > > POC > > > > patch appears to make recovery stuck in RecoveryRequiresIntParameter. > > > > > > > > > > I agree with this, I think we previously discussed these approaches > > > where we can wait in pg_wal_replay_pasue() or > > > pg_is_wal_replay_pasued(). In fact, we had an older version where we > > > put the wait in pg_is_wal_replay_pasued(). But it appeared that doing > > > so will add extra complexity as well as instead of waiting in these > > > APIs the wait logic can be implemented in the application code which > > > is actually using these APIs and IMHO that will give better control to > > > the users. > > > > And also, having waiting logic in pg_wal_replay_pasue() or > > pg_is_wal_replay_pasued() required changes to the existing API such as > > a timeout to not allow them infinitely waiting. > > I don't understand that. pg_wal_replay_pause() is defined as "pausees > recovery". so it is the correct behavior to wait actual pause. > pg_is_wal_replay_paused() doesn't wait for anything at all. What I meant was that if we were to add waiting logic inside pg_wal_replay_pause, we should also have a timeout with some default value, to avoid pg_wal_replay_pause waiting forever in the waiting loop. Within that timeout, if the recovery isn't paused, pg_wal_replay_pause will return probably a warning and a false(this requires us to change the return value of the existing pg_wal_replay_pause)? To avoid changing the existing API and return type, a new function pg_get_wal_replay_pause_state is introduced. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
Sorry, I made a mistake here. At Tue, 09 Feb 2021 14:55:23 +0900 (JST), Kyotaro Horiguchi wrote in > At Tue, 9 Feb 2021 09:47:58 +0530, Dilip Kumar wrote > in > > APIs the wait logic can be implemented in the application code which > > is actually using these APIs and IMHO that will give better control to > > the users. > > Year, with the PoC pg_wal_replay_pause() can make a short wait as a > side-effect but the tri-state patch also can add a function to wait > for the state suffices. I said that it is surprising that pg_is_wal_replay_paused() waits for the state change. But I didn't say that pg_wal_replay_pause() shouldn't wait for the actual pause. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is Recovery actually paused?
At Tue, 9 Feb 2021 09:58:30 +0530, Bharath Rupireddy wrote in > On Tue, Feb 9, 2021 at 9:48 AM Dilip Kumar wrote: > > > > On Tue, Feb 9, 2021 at 8:54 AM Yugo NAGATA wrote: > > > > > > On Tue, 09 Feb 2021 10:58:04 +0900 (JST) > > > Kyotaro Horiguchi wrote: > > > > If we are going to introduce that complexity, I'd like to re-propose > > > > to introduce interlocking between the recovery side and the > > > > pause-requestor side instead of introducing the intermediate state, > > > > which is the cause of the complexity. > > > > > > > > The attached PoC patch adds: > > > > > > > > - A solid checkpoint just before calling rm_redo. It doesn't add a > > > > info_lck since the check is done in the existing lock section. > > > > > > > > - Interlocking between the above and SetRecoveryPause without adding a > > > > shared variable. > > > > (This is what I called "synchronous" before.) > > > > > > I think waiting in pg_wal_replay_pasue is a possible option, but this will > > > also introduce other complexity to codes such as possibility of waiting > > > for > > > long or for ever. For example, waiting in SetRecoveryPause as in your POC > > > patch appears to make recovery stuck in RecoveryRequiresIntParameter. > > > > > > > I agree with this, I think we previously discussed these approaches > > where we can wait in pg_wal_replay_pasue() or > > pg_is_wal_replay_pasued(). In fact, we had an older version where we > > put the wait in pg_is_wal_replay_pasued(). But it appeared that doing > > so will add extra complexity as well as instead of waiting in these > > APIs the wait logic can be implemented in the application code which > > is actually using these APIs and IMHO that will give better control to > > the users. > > And also, having waiting logic in pg_wal_replay_pasue() or > pg_is_wal_replay_pasued() required changes to the existing API such as > a timeout to not allow them infinitely waiting. I don't understand that. pg_wal_replay_pause() is defined as "pausees recovery". so it is the correct behavior to wait actual pause. pg_is_wal_replay_paused() doesn't wait for anything at all. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is Recovery actually paused?
At Tue, 9 Feb 2021 09:47:58 +0530, Dilip Kumar wrote in > On Tue, Feb 9, 2021 at 8:54 AM Yugo NAGATA wrote: > > > > On Tue, 09 Feb 2021 10:58:04 +0900 (JST) > > Kyotaro Horiguchi wrote: > > > > > At Mon, 8 Feb 2021 17:05:52 +0530, Dilip Kumar > > > wrote in > > > > On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA wrote: > > > > > > > > > > On Mon, 08 Feb 2021 17:32:46 +0900 (JST) > > > > > Kyotaro Horiguchi wrote: > > > > > > > > > > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA > > > > > > wrote in > > > > > > > > > > I think the right fix should be that the state should never > > > > > > > > > > go from > > > > > > > > > > ‘paused’ to ‘pause requested’ so I think > > > > > > > > > > pg_wal_replay_pause should take > > > > > > > > > > care of that. > > > > > > > > > > > > > > > > > > It makes sense to take care of this in pg_wal_replay_pause, > > > > > > > > > but I wonder > > > > > > > > > it can not handle the case that a user resume and pause again > > > > > > > > > while a sleep. > > > > > > > > > > > > > > > > Right, we will have to check and set in the loop. But we > > > > > > > > should not > > > > > > > > allow the state to go from paused to pause requested > > > > > > > > irrespective of > > > > > > > > this. > > > > > > > > > > > > > > I agree with you. > > > > > > > > > > > > Is there any actual harm if PAUSED returns to REQUESETED, assuming > > > > > > we > > > > > > immediately change the state to PAUSE always we see REQUESTED in the > > > > > > waiting loop, despite that we allow change the state from PAUSE to > > > > > > REQUESTED via NOT_PAUSED between two successive loop condition > > > > > > checks? > > > > > > > > > > If a user call pg_wal_replay_pause while recovery is paused, users can > > > > > observe 'pause requested' during a sleep alghough the time window is > > > > > short. > > > > > It seems a bit odd that pg_wal_replay_pause changes the state like > > > > > this > > > > > because This state meeans that recovery may not be 'paused'. > > > > > > > > Yeah, this appears wrong that after 'paused' we go back to 'pause > > > > requested'. the logical state transition should always be as below > > > > > > > > NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to > > > > request and then paused but there is nothing wrong with going to > > > > paused) > > > > PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or > > > > get paused) > > > > PAUSED -> NOT PAUSED (from PAUSED we should not go to the > > > > PAUSE_REQUESTED without going to NOT PAUSED) > > > > > > I didn't asked about the internal logical correctness, but asked about > > > *actual harm* revealed to users. I don't see any actual harm in the > > > "wrong" transition because: > > > > Actually, the incorrect state transition is not so harmful except that > > users can observe unnecessary state changes. However, I don't think any > > actual harm in prohibit the incorrect state transition. So, I think we > > can do it. > > > > > If we are going to introduce that complexity, I'd like to re-propose > > > to introduce interlocking between the recovery side and the > > > pause-requestor side instead of introducing the intermediate state, > > > which is the cause of the complexity. > > > > > > The attached PoC patch adds: > > > > > > - A solid checkpoint just before calling rm_redo. It doesn't add a > > > info_lck since the check is done in the existing lock section. > > > > > > - Interlocking between the above and SetRecoveryPause without adding a > > > shared variable. > > > (This is what I called "synchronous" before.) > > > > I think waiting in pg_wal_replay_pasue is a possible option, but this will > > also introduce other complexity to codes such as possibility of waiting for > > long or for ever. For example, waiting in SetRecoveryPause as in your POC > > patch appears to make recovery stuck in RecoveryRequiresIntParameter. Ah. Yes, startup process does not need to wait. That is a bug of the patch. No other callers don't cause the self dead lock. > I agree with this, I think we previously discussed these approaches > where we can wait in pg_wal_replay_pasue() or > pg_is_wal_replay_pasued(). In fact, we had an older version where we > put the wait in pg_is_wal_replay_pasued(). But it appeared that doing Note that the expected waiting period is while calling rmgr_redo(). If it is stuck for a long time, that suggests something's going wrong. > so will add extra complexity as well as instead of waiting in these > APIs the wait logic can be implemented in the application code which > is actually using these APIs and IMHO that will give better control to > the users. Year, with the PoC pg_wal_replay_pause() can make a short wait as a side-effect but the tri-state patch also can add a function to wait for the state suffices. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlo
Re: Is Recovery actually paused?
At Tue, 9 Feb 2021 12:23:23 +0900, Yugo NAGATA wrote in > On Tue, 09 Feb 2021 10:58:04 +0900 (JST) > Kyotaro Horiguchi wrote: > > I didn't asked about the internal logical correctness, but asked about > > *actual harm* revealed to users. I don't see any actual harm in the > > "wrong" transition because: > > Actually, the incorrect state transition is not so harmful except that > users can observe unnecessary state changes. However, I don't think any > actual harm in prohibit the incorrect state transition. So, I think we > can do it. I don't say that we cannot do that. Just it is needeless. > > If we are going to introduce that complexity, I'd like to re-propose > > to introduce interlocking between the recovery side and the > > pause-requestor side instead of introducing the intermediate state, > > which is the cause of the complexity. > > > > The attached PoC patch adds: > > > > - A solid checkpoint just before calling rm_redo. It doesn't add a > > info_lck since the check is done in the existing lock section. > > > > - Interlocking between the above and SetRecoveryPause without adding a > > shared variable. > > (This is what I called "synchronous" before.) > > I think waiting in pg_wal_replay_pasue is a possible option, but this will > also introduce other complexity to codes such as possibility of waiting for > long or for ever. For example, waiting in SetRecoveryPause as in your POC > patch appears to make recovery stuck in RecoveryRequiresIntParameter. That is easily avoidable CFI in the loop. > By the way, attaching other patch to a thread without the original patch > will make commitfest and cfbot APP confused... Oops! Sorry for that. I forgot to append .txt or such to the file name. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is Recovery actually paused?
On Tue, Feb 9, 2021 at 9:48 AM Dilip Kumar wrote: > > On Tue, Feb 9, 2021 at 8:54 AM Yugo NAGATA wrote: > > > > On Tue, 09 Feb 2021 10:58:04 +0900 (JST) > > Kyotaro Horiguchi wrote: > > > > > At Mon, 8 Feb 2021 17:05:52 +0530, Dilip Kumar > > > wrote in > > > > On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA wrote: > > > > > > > > > > On Mon, 08 Feb 2021 17:32:46 +0900 (JST) > > > > > Kyotaro Horiguchi wrote: > > > > > > > > > > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA > > > > > > wrote in > > > > > > > > > > I think the right fix should be that the state should never > > > > > > > > > > go from > > > > > > > > > > ‘paused’ to ‘pause requested’ so I think > > > > > > > > > > pg_wal_replay_pause should take > > > > > > > > > > care of that. > > > > > > > > > > > > > > > > > > It makes sense to take care of this in pg_wal_replay_pause, > > > > > > > > > but I wonder > > > > > > > > > it can not handle the case that a user resume and pause again > > > > > > > > > while a sleep. > > > > > > > > > > > > > > > > Right, we will have to check and set in the loop. But we > > > > > > > > should not > > > > > > > > allow the state to go from paused to pause requested > > > > > > > > irrespective of > > > > > > > > this. > > > > > > > > > > > > > > I agree with you. > > > > > > > > > > > > Is there any actual harm if PAUSED returns to REQUESETED, assuming > > > > > > we > > > > > > immediately change the state to PAUSE always we see REQUESTED in the > > > > > > waiting loop, despite that we allow change the state from PAUSE to > > > > > > REQUESTED via NOT_PAUSED between two successive loop condition > > > > > > checks? > > > > > > > > > > If a user call pg_wal_replay_pause while recovery is paused, users can > > > > > observe 'pause requested' during a sleep alghough the time window is > > > > > short. > > > > > It seems a bit odd that pg_wal_replay_pause changes the state like > > > > > this > > > > > because This state meeans that recovery may not be 'paused'. > > > > > > > > Yeah, this appears wrong that after 'paused' we go back to 'pause > > > > requested'. the logical state transition should always be as below > > > > > > > > NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to > > > > request and then paused but there is nothing wrong with going to > > > > paused) > > > > PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or > > > > get paused) > > > > PAUSED -> NOT PAUSED (from PAUSED we should not go to the > > > > PAUSE_REQUESTED without going to NOT PAUSED) > > > > > > I didn't asked about the internal logical correctness, but asked about > > > *actual harm* revealed to users. I don't see any actual harm in the > > > "wrong" transition because: > > > > Actually, the incorrect state transition is not so harmful except that > > users can observe unnecessary state changes. However, I don't think any > > actual harm in prohibit the incorrect state transition. So, I think we > > can do it. > > > > > If we are going to introduce that complexity, I'd like to re-propose > > > to introduce interlocking between the recovery side and the > > > pause-requestor side instead of introducing the intermediate state, > > > which is the cause of the complexity. > > > > > > The attached PoC patch adds: > > > > > > - A solid checkpoint just before calling rm_redo. It doesn't add a > > > info_lck since the check is done in the existing lock section. > > > > > > - Interlocking between the above and SetRecoveryPause without adding a > > > shared variable. > > > (This is what I called "synchronous" before.) > > > > I think waiting in pg_wal_replay_pasue is a possible option, but this will > > also introduce other complexity to codes such as possibility of waiting for > > long or for ever. For example, waiting in SetRecoveryPause as in your POC > > patch appears to make recovery stuck in RecoveryRequiresIntParameter. > > > > I agree with this, I think we previously discussed these approaches > where we can wait in pg_wal_replay_pasue() or > pg_is_wal_replay_pasued(). In fact, we had an older version where we > put the wait in pg_is_wal_replay_pasued(). But it appeared that doing > so will add extra complexity as well as instead of waiting in these > APIs the wait logic can be implemented in the application code which > is actually using these APIs and IMHO that will give better control to > the users. And also, having waiting logic in pg_wal_replay_pasue() or pg_is_wal_replay_pasued() required changes to the existing API such as a timeout to not allow them infinitely waiting. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Tue, Feb 9, 2021 at 8:54 AM Yugo NAGATA wrote: > > On Tue, 09 Feb 2021 10:58:04 +0900 (JST) > Kyotaro Horiguchi wrote: > > > At Mon, 8 Feb 2021 17:05:52 +0530, Dilip Kumar > > wrote in > > > On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA wrote: > > > > > > > > On Mon, 08 Feb 2021 17:32:46 +0900 (JST) > > > > Kyotaro Horiguchi wrote: > > > > > > > > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA > > > > > wrote in > > > > > > > > > I think the right fix should be that the state should never > > > > > > > > > go from > > > > > > > > > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause > > > > > > > > > should take > > > > > > > > > care of that. > > > > > > > > > > > > > > > > It makes sense to take care of this in pg_wal_replay_pause, but > > > > > > > > I wonder > > > > > > > > it can not handle the case that a user resume and pause again > > > > > > > > while a sleep. > > > > > > > > > > > > > > Right, we will have to check and set in the loop. But we should > > > > > > > not > > > > > > > allow the state to go from paused to pause requested irrespective > > > > > > > of > > > > > > > this. > > > > > > > > > > > > I agree with you. > > > > > > > > > > Is there any actual harm if PAUSED returns to REQUESETED, assuming we > > > > > immediately change the state to PAUSE always we see REQUESTED in the > > > > > waiting loop, despite that we allow change the state from PAUSE to > > > > > REQUESTED via NOT_PAUSED between two successive loop condition checks? > > > > > > > > If a user call pg_wal_replay_pause while recovery is paused, users can > > > > observe 'pause requested' during a sleep alghough the time window is > > > > short. > > > > It seems a bit odd that pg_wal_replay_pause changes the state like this > > > > because This state meeans that recovery may not be 'paused'. > > > > > > Yeah, this appears wrong that after 'paused' we go back to 'pause > > > requested'. the logical state transition should always be as below > > > > > > NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to > > > request and then paused but there is nothing wrong with going to > > > paused) > > > PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or get > > > paused) > > > PAUSED -> NOT PAUSED (from PAUSED we should not go to the > > > PAUSE_REQUESTED without going to NOT PAUSED) > > > > I didn't asked about the internal logical correctness, but asked about > > *actual harm* revealed to users. I don't see any actual harm in the > > "wrong" transition because: > > Actually, the incorrect state transition is not so harmful except that > users can observe unnecessary state changes. However, I don't think any > actual harm in prohibit the incorrect state transition. So, I think we > can do it. > > > If we are going to introduce that complexity, I'd like to re-propose > > to introduce interlocking between the recovery side and the > > pause-requestor side instead of introducing the intermediate state, > > which is the cause of the complexity. > > > > The attached PoC patch adds: > > > > - A solid checkpoint just before calling rm_redo. It doesn't add a > > info_lck since the check is done in the existing lock section. > > > > - Interlocking between the above and SetRecoveryPause without adding a > > shared variable. > > (This is what I called "synchronous" before.) > > I think waiting in pg_wal_replay_pasue is a possible option, but this will > also introduce other complexity to codes such as possibility of waiting for > long or for ever. For example, waiting in SetRecoveryPause as in your POC > patch appears to make recovery stuck in RecoveryRequiresIntParameter. > I agree with this, I think we previously discussed these approaches where we can wait in pg_wal_replay_pasue() or pg_is_wal_replay_pasued(). In fact, we had an older version where we put the wait in pg_is_wal_replay_pasued(). But it appeared that doing so will add extra complexity as well as instead of waiting in these APIs the wait logic can be implemented in the application code which is actually using these APIs and IMHO that will give better control to the users. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Tue, Feb 9, 2021 at 7:28 AM Kyotaro Horiguchi wrote: > > At Mon, 8 Feb 2021 17:05:52 +0530, Dilip Kumar wrote > in > > On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA wrote: > > > > > > On Mon, 08 Feb 2021 17:32:46 +0900 (JST) > > > Kyotaro Horiguchi wrote: > > > > > > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA > > > > wrote in > > > > > > > > I think the right fix should be that the state should never go > > > > > > > > from > > > > > > > > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause > > > > > > > > should take > > > > > > > > care of that. > > > > > > > > > > > > > > It makes sense to take care of this in pg_wal_replay_pause, but I > > > > > > > wonder > > > > > > > it can not handle the case that a user resume and pause again > > > > > > > while a sleep. > > > > > > > > > > > > Right, we will have to check and set in the loop. But we should not > > > > > > allow the state to go from paused to pause requested irrespective of > > > > > > this. > > > > > > > > > > I agree with you. > > > > > > > > Is there any actual harm if PAUSED returns to REQUESETED, assuming we > > > > immediately change the state to PAUSE always we see REQUESTED in the > > > > waiting loop, despite that we allow change the state from PAUSE to > > > > REQUESTED via NOT_PAUSED between two successive loop condition checks? > > > > > > If a user call pg_wal_replay_pause while recovery is paused, users can > > > observe 'pause requested' during a sleep alghough the time window is > > > short. > > > It seems a bit odd that pg_wal_replay_pause changes the state like this > > > because This state meeans that recovery may not be 'paused'. > > > > Yeah, this appears wrong that after 'paused' we go back to 'pause > > requested'. the logical state transition should always be as below > > > > NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to > > request and then paused but there is nothing wrong with going to > > paused) > > PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or get > > paused) > > PAUSED -> NOT PAUSED (from PAUSED we should not go to the > > PAUSE_REQUESTED without going to NOT PAUSED) > > I didn't asked about the internal logical correctness, but asked about > *actual harm* revealed to users. I don't see any actual harm in the > "wrong" transition because: > > 1. It is not wrong nor strange that the invoker of pg_wal_replay_pause > sees the state PAUSE_REQUESTED before it changes to PAUSED. Even if > the previous state was PAUSED, it is no business of the requestors. The 'pg_wal_replay_pause' request to pause the recovery so it is fine to first change the state to PAUSE_REQUESTED and then to PAUSED. But if the recovery is already paused then what is the point in bringing the state back to PAUSE_REQUESTED. For example, suppose the tool want to raise the pause request and wait until recovery actually paused, so if it was already paused then if we bring it back to PAUSE_REQUESTED then it doesn't look correct and we might need to do an extra wait cycle in the tool until it reaches back to PAUSED, I don't think that is really a best design. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Tue, 09 Feb 2021 10:58:04 +0900 (JST) Kyotaro Horiguchi wrote: > At Mon, 8 Feb 2021 17:05:52 +0530, Dilip Kumar wrote > in > > On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA wrote: > > > > > > On Mon, 08 Feb 2021 17:32:46 +0900 (JST) > > > Kyotaro Horiguchi wrote: > > > > > > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA > > > > wrote in > > > > > > > > I think the right fix should be that the state should never go > > > > > > > > from > > > > > > > > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause > > > > > > > > should take > > > > > > > > care of that. > > > > > > > > > > > > > > It makes sense to take care of this in pg_wal_replay_pause, but I > > > > > > > wonder > > > > > > > it can not handle the case that a user resume and pause again > > > > > > > while a sleep. > > > > > > > > > > > > Right, we will have to check and set in the loop. But we should not > > > > > > allow the state to go from paused to pause requested irrespective of > > > > > > this. > > > > > > > > > > I agree with you. > > > > > > > > Is there any actual harm if PAUSED returns to REQUESETED, assuming we > > > > immediately change the state to PAUSE always we see REQUESTED in the > > > > waiting loop, despite that we allow change the state from PAUSE to > > > > REQUESTED via NOT_PAUSED between two successive loop condition checks? > > > > > > If a user call pg_wal_replay_pause while recovery is paused, users can > > > observe 'pause requested' during a sleep alghough the time window is > > > short. > > > It seems a bit odd that pg_wal_replay_pause changes the state like this > > > because This state meeans that recovery may not be 'paused'. > > > > Yeah, this appears wrong that after 'paused' we go back to 'pause > > requested'. the logical state transition should always be as below > > > > NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to > > request and then paused but there is nothing wrong with going to > > paused) > > PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or get > > paused) > > PAUSED -> NOT PAUSED (from PAUSED we should not go to the > > PAUSE_REQUESTED without going to NOT PAUSED) > > I didn't asked about the internal logical correctness, but asked about > *actual harm* revealed to users. I don't see any actual harm in the > "wrong" transition because: Actually, the incorrect state transition is not so harmful except that users can observe unnecessary state changes. However, I don't think any actual harm in prohibit the incorrect state transition. So, I think we can do it. > If we are going to introduce that complexity, I'd like to re-propose > to introduce interlocking between the recovery side and the > pause-requestor side instead of introducing the intermediate state, > which is the cause of the complexity. > > The attached PoC patch adds: > > - A solid checkpoint just before calling rm_redo. It doesn't add a > info_lck since the check is done in the existing lock section. > > - Interlocking between the above and SetRecoveryPause without adding a > shared variable. > (This is what I called "synchronous" before.) I think waiting in pg_wal_replay_pasue is a possible option, but this will also introduce other complexity to codes such as possibility of waiting for long or for ever. For example, waiting in SetRecoveryPause as in your POC patch appears to make recovery stuck in RecoveryRequiresIntParameter. By the way, attaching other patch to a thread without the original patch will make commitfest and cfbot APP confused... Regards, Yugo Nagata -- Yugo NAGATA
Re: Is Recovery actually paused?
At Mon, 8 Feb 2021 17:05:52 +0530, Dilip Kumar wrote in > On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA wrote: > > > > On Mon, 08 Feb 2021 17:32:46 +0900 (JST) > > Kyotaro Horiguchi wrote: > > > > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA > > > wrote in > > > > > > > I think the right fix should be that the state should never go > > > > > > > from > > > > > > > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause > > > > > > > should take > > > > > > > care of that. > > > > > > > > > > > > It makes sense to take care of this in pg_wal_replay_pause, but I > > > > > > wonder > > > > > > it can not handle the case that a user resume and pause again while > > > > > > a sleep. > > > > > > > > > > Right, we will have to check and set in the loop. But we should not > > > > > allow the state to go from paused to pause requested irrespective of > > > > > this. > > > > > > > > I agree with you. > > > > > > Is there any actual harm if PAUSED returns to REQUESETED, assuming we > > > immediately change the state to PAUSE always we see REQUESTED in the > > > waiting loop, despite that we allow change the state from PAUSE to > > > REQUESTED via NOT_PAUSED between two successive loop condition checks? > > > > If a user call pg_wal_replay_pause while recovery is paused, users can > > observe 'pause requested' during a sleep alghough the time window is short. > > It seems a bit odd that pg_wal_replay_pause changes the state like this > > because This state meeans that recovery may not be 'paused'. > > Yeah, this appears wrong that after 'paused' we go back to 'pause > requested'. the logical state transition should always be as below > > NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to > request and then paused but there is nothing wrong with going to > paused) > PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or get > paused) > PAUSED -> NOT PAUSED (from PAUSED we should not go to the > PAUSE_REQUESTED without going to NOT PAUSED) I didn't asked about the internal logical correctness, but asked about *actual harm* revealed to users. I don't see any actual harm in the "wrong" transition because: 1. It is not wrong nor strange that the invoker of pg_wal_replay_pause sees the state PAUSE_REQUESTED before it changes to PAUSED. Even if the previous state was PAUSED, it is no business of the requestors. 2. It is no harm in the recovery side since PAUSE_REQUESTED and PAUSED are effectively the same state. 3. After we inhibited the direct transition from PAUSED->PAUSE_REQUESTED, the effectively the same transition PAUSED->NOT_PAUSED->PAUSE_REQUESTED is still allowed. The inhibition of the former transition doesn't protect anything other than seeming correctness of the transition. If we are going to introduce that complexity, I'd like to re-propose to introduce interlocking between the recovery side and the pause-requestor side instead of introducing the intermediate state, which is the cause of the complexity. The problem is due to the looseness of checking for pause requests in the existing checkponts, and the window after the last checkpoint until calling rm_redo(). The attached PoC patch adds: - A solid checkpoint just before calling rm_redo. It doesn't add a info_lck since the check is done in the existing lock section. - Interlocking between the above and SetRecoveryPause without adding a shared variable. (This is what I called "synchronous" before.) There's a concern about pausing after updating XlogCtl->replayEndRecPtr but I don't see an issue yet.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 8e3b5df7dc..26aadb4178 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6076,6 +6076,20 @@ void SetRecoveryPause(bool recoveryPause) { SpinLockAcquire(&XLogCtl->info_lck); + + /* + * Wait for the application of the record being applied to finish, so that + * no records will be applied after this function returns. We don't need to + * wait when ending a pause. Anyway we are requesting a recovery pause, we + * don't mind a possible slow down of recovery by the info_lck here. + */ + while(recoveryPause && !XLogCtl->recoveryPause && + XLogCtl->replayEndRecPtr != XLogCtl->lastReplayedEndRecPtr) + { + SpinLockRelease(&XLogCtl->info_lck); + pg_usleep(1L); /* 10 ms */ + SpinLockAcquire(&XLogCtl->info_lck); + } XLogCtl->recoveryPause = recoveryPause; SpinLockRelease(&XLogCtl->info_lck); } @@ -7262,6 +7276,7 @@ StartupXLOG(void) do { bool switchedTLI = false; +bool pause_requested = false; #ifdef WAL_DEBUG if (XLOG_DEBUG || @@ -7292,11 +7307,9 @@ StartupXLOG(void) * Note that we intentionally don't take the info_lck spinlock * here. We might therefore read a slightly stale value of * the recoveryPause flag, but it can't be ver
Re: Is Recovery actually paused?
On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA wrote: > > On Mon, 08 Feb 2021 17:32:46 +0900 (JST) > Kyotaro Horiguchi wrote: > > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA wrote > > in > > > > > > I think the right fix should be that the state should never go from > > > > > > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause > > > > > > should take > > > > > > care of that. > > > > > > > > > > It makes sense to take care of this in pg_wal_replay_pause, but I > > > > > wonder > > > > > it can not handle the case that a user resume and pause again while a > > > > > sleep. > > > > > > > > Right, we will have to check and set in the loop. But we should not > > > > allow the state to go from paused to pause requested irrespective of > > > > this. > > > > > > I agree with you. > > > > Is there any actual harm if PAUSED returns to REQUESETED, assuming we > > immediately change the state to PAUSE always we see REQUESTED in the > > waiting loop, despite that we allow change the state from PAUSE to > > REQUESTED via NOT_PAUSED between two successive loop condition checks? > > If a user call pg_wal_replay_pause while recovery is paused, users can > observe 'pause requested' during a sleep alghough the time window is short. > It seems a bit odd that pg_wal_replay_pause changes the state like this > because This state meeans that recovery may not be 'paused'. Yeah, this appears wrong that after 'paused' we go back to 'pause requested'. the logical state transition should always be as below NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to request and then paused but there is nothing wrong with going to paused) PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or get paused) PAUSED -> NOT PAUSED (from PAUSED we should not go to the PAUSE_REQUESTED without going to NOT PAUSED) -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Mon, 08 Feb 2021 17:32:46 +0900 (JST) Kyotaro Horiguchi wrote: > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA wrote in > > > > > I think the right fix should be that the state should never go from > > > > > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause should > > > > > take > > > > > care of that. > > > > > > > > It makes sense to take care of this in pg_wal_replay_pause, but I wonder > > > > it can not handle the case that a user resume and pause again while a > > > > sleep. > > > > > > Right, we will have to check and set in the loop. But we should not > > > allow the state to go from paused to pause requested irrespective of > > > this. > > > > I agree with you. > > Is there any actual harm if PAUSED returns to REQUESETED, assuming we > immediately change the state to PAUSE always we see REQUESTED in the > waiting loop, despite that we allow change the state from PAUSE to > REQUESTED via NOT_PAUSED between two successive loop condition checks? If a user call pg_wal_replay_pause while recovery is paused, users can observe 'pause requested' during a sleep alghough the time window is short. It seems a bit odd that pg_wal_replay_pause changes the state like this because This state meeans that recovery may not be 'paused'. -- Yugo NAGATA
Re: Is Recovery actually paused?
At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA wrote in > > > > I think the right fix should be that the state should never go from > > > > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause should > > > > take > > > > care of that. > > > > > > It makes sense to take care of this in pg_wal_replay_pause, but I wonder > > > it can not handle the case that a user resume and pause again while a > > > sleep. > > > > Right, we will have to check and set in the loop. But we should not > > allow the state to go from paused to pause requested irrespective of > > this. > > I agree with you. Is there any actual harm if PAUSED returns to REQUESETED, assuming we immediately change the state to PAUSE always we see REQUESTED in the waiting loop, despite that we allow change the state from PAUSE to REQUESTED via NOT_PAUSED between two successive loop condition checks? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is Recovery actually paused?
On Mon, 8 Feb 2021 09:35:00 +0530 Dilip Kumar wrote: > On Mon, Feb 8, 2021 at 8:18 AM Yugo NAGATA wrote: > > > > On Mon, 8 Feb 2021 07:51:22 +0530 > > Dilip Kumar wrote: > > > > > On Mon, 8 Feb 2021 at 6:38 AM, Yugo NAGATA wrote: > > > > > > > Hi, > > > > > > > > On Sun, 7 Feb 2021 19:27:02 +0530 > > > > Dilip Kumar wrote: > > > > > > > > > On Sun, Feb 7, 2021 at 6:44 PM Bharath Rupireddy > > > > > wrote: > > > > > > > > > > > > On Fri, Feb 5, 2021 at 10:14 AM Bharath Rupireddy > > > > > > wrote: > > > > > > > > We can not do that, basically, under one lock we need to check > > > > > > > > the > > > > > > > > state and set it to pause. Because by the time you release the > > > > lock > > > > > > > > someone might set it to RECOVERY_NOT_PAUSED then you don't want > > > > > > > > to > > > > set > > > > > > > > it to RECOVERY_PAUSED. > > > > > > > > > > > > > > Got it. Thanks. > > > > > > > > > > > > Hi Dilip, I have one more question: > > > > > > > > > > > > +/* test for recovery pause, if user has requested the > > > > > > pause */ > > > > > > +if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState > > > > > > == > > > > > > +RECOVERY_PAUSE_REQUESTED) > > > > > > +recoveryPausesHere(false); > > > > > > + > > > > > > +now = GetCurrentTimestamp(); > > > > > > + > > > > > > > > > > > > Do we need now = GetCurrentTimestamp(); here? Because, I see that > > > > > > whenever the variable now is used within the for loop in > > > > > > WaitForWALToBecomeAvailable, it's re-calculated anyways. It's being > > > > > > used within case XLOG_FROM_STREAM: > > > > > > > > > > > > Am I missing something? > > > > > > > > > > Yeah, I don't see any reason for doing this, maybe it got copy pasted > > > > > by mistake. Thanks for observing this. > > > > > > > > I also have a question: > > > > > > > > @@ -6270,14 +6291,14 @@ RecoveryRequiresIntParameter(const char > > > > *param_name, int currValue, int minValue > > > >currValue, > > > >minValue))); > > > > > > > > - SetRecoveryPause(true); > > > > + SetRecoveryPause(RECOVERY_PAUSED); > > > > > > > > ereport(LOG, > > > > (errmsg("recovery has paused"), > > > > errdetail("If recovery is > > > > unpaused, the server will shut down."), > > > > errhint("You can then restart > > > > the > > > > server after making the necessary configuration changes."))); > > > > > > > > - while (RecoveryIsPaused()) > > > > + while (GetRecoveryPauseState() != > > > > RECOVERY_NOT_PAUSED) > > > > { > > > > HandleStartupProcInterrupts(); > > > > > > > > > > > > > > > > If a user call pg_wal_replay_pause while waiting in > > > > RecoveryRequiresIntParameter, > > > > the state become 'pause requested' and this never returns to 'paused'. > > > > Should we check recoveryPauseState in this loop as in > > > > > > > > > I think the right fix should be that the state should never go from > > > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause should take > > > care of that. > > > > It makes sense to take care of this in pg_wal_replay_pause, but I wonder > > it can not handle the case that a user resume and pause again while a sleep. > > Right, we will have to check and set in the loop. But we should not > allow the state to go from paused to pause requested irrespective of > this. I agree with you. -- Yugo NAGATA
Re: Is Recovery actually paused?
On Mon, Feb 8, 2021 at 9:49 AM Bharath Rupireddy wrote: > > On Mon, Feb 8, 2021 at 9:35 AM Dilip Kumar wrote: > > > > > If a user call pg_wal_replay_pause while waiting in > > > > > RecoveryRequiresIntParameter, > > > > > the state become 'pause requested' and this never returns to 'paused'. > > > > > Should we check recoveryPauseState in this loop as in > > > > > > > > > > > > I think the right fix should be that the state should never go from > > > > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause should > > > > take > > > > care of that. > > > > > > It makes sense to take care of this in pg_wal_replay_pause, but I wonder > > > it can not handle the case that a user resume and pause again while a > > > sleep. > > > > Right, we will have to check and set in the loop. But we should not > > allow the state to go from paused to pause requested irrespective of > > this. > > We can think of a state machine with the states "not paused", "pause > requested", "paused". While we can go to "not paused" from any state, > but cannot go to "pause requested" from "paused". > > So, will pg_wal_replay_pause throw an error or warning or silently > return when it's called and the state is "paused" already? It should just silently return because pg_wal_replay_pause just claim it request to pause, but it not mean that it can not pause immediately. Maybe we > should add better commenting in pg_wal_replay_pause why we don't set > "pause requested" when the state is already "paused". > And also, if we are adding below code in the > RecoveryRequiresIntParameter loop, it's better to make it a function, > like your earlier patch. > > /* > * If recovery pause is requested then set it paused. While we are in > * the loop, user might resume and pause again so set this every time. > */ > SpinLockAcquire(&XLogCtl->info_lck); > if (XLogCtl->recoveryPauseState == RECOVERY_PAUSE_REQUESTED) > XLogCtl->recoveryPauseState = RECOVERY_PAUSED; > SpinLockRelease(&XLogCtl->info_lck); Yes, it should go back to function now as in the older versions. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Mon, Feb 8, 2021 at 9:35 AM Dilip Kumar wrote: > > > > If a user call pg_wal_replay_pause while waiting in > > > > RecoveryRequiresIntParameter, > > > > the state become 'pause requested' and this never returns to 'paused'. > > > > Should we check recoveryPauseState in this loop as in > > > > > > > > > I think the right fix should be that the state should never go from > > > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause should take > > > care of that. > > > > It makes sense to take care of this in pg_wal_replay_pause, but I wonder > > it can not handle the case that a user resume and pause again while a sleep. > > Right, we will have to check and set in the loop. But we should not > allow the state to go from paused to pause requested irrespective of > this. We can think of a state machine with the states "not paused", "pause requested", "paused". While we can go to "not paused" from any state, but cannot go to "pause requested" from "paused". So, will pg_wal_replay_pause throw an error or warning or silently return when it's called and the state is "paused" already? Maybe we should add better commenting in pg_wal_replay_pause why we don't set "pause requested" when the state is already "paused". And also, if we are adding below code in the RecoveryRequiresIntParameter loop, it's better to make it a function, like your earlier patch. /* * If recovery pause is requested then set it paused. While we are in * the loop, user might resume and pause again so set this every time. */ SpinLockAcquire(&XLogCtl->info_lck); if (XLogCtl->recoveryPauseState == RECOVERY_PAUSE_REQUESTED) XLogCtl->recoveryPauseState = RECOVERY_PAUSED; SpinLockRelease(&XLogCtl->info_lck); With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Mon, Feb 8, 2021 at 8:18 AM Yugo NAGATA wrote: > > On Mon, 8 Feb 2021 07:51:22 +0530 > Dilip Kumar wrote: > > > On Mon, 8 Feb 2021 at 6:38 AM, Yugo NAGATA wrote: > > > > > Hi, > > > > > > On Sun, 7 Feb 2021 19:27:02 +0530 > > > Dilip Kumar wrote: > > > > > > > On Sun, Feb 7, 2021 at 6:44 PM Bharath Rupireddy > > > > wrote: > > > > > > > > > > On Fri, Feb 5, 2021 at 10:14 AM Bharath Rupireddy > > > > > wrote: > > > > > > > We can not do that, basically, under one lock we need to check the > > > > > > > state and set it to pause. Because by the time you release the > > > lock > > > > > > > someone might set it to RECOVERY_NOT_PAUSED then you don't want to > > > set > > > > > > > it to RECOVERY_PAUSED. > > > > > > > > > > > > Got it. Thanks. > > > > > > > > > > Hi Dilip, I have one more question: > > > > > > > > > > +/* test for recovery pause, if user has requested the pause > > > > > */ > > > > > +if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState == > > > > > +RECOVERY_PAUSE_REQUESTED) > > > > > +recoveryPausesHere(false); > > > > > + > > > > > +now = GetCurrentTimestamp(); > > > > > + > > > > > > > > > > Do we need now = GetCurrentTimestamp(); here? Because, I see that > > > > > whenever the variable now is used within the for loop in > > > > > WaitForWALToBecomeAvailable, it's re-calculated anyways. It's being > > > > > used within case XLOG_FROM_STREAM: > > > > > > > > > > Am I missing something? > > > > > > > > Yeah, I don't see any reason for doing this, maybe it got copy pasted > > > > by mistake. Thanks for observing this. > > > > > > I also have a question: > > > > > > @@ -6270,14 +6291,14 @@ RecoveryRequiresIntParameter(const char > > > *param_name, int currValue, int minValue > > >currValue, > > >minValue))); > > > > > > - SetRecoveryPause(true); > > > + SetRecoveryPause(RECOVERY_PAUSED); > > > > > > ereport(LOG, > > > (errmsg("recovery has paused"), > > > errdetail("If recovery is > > > unpaused, the server will shut down."), > > > errhint("You can then restart the > > > server after making the necessary configuration changes."))); > > > > > > - while (RecoveryIsPaused()) > > > + while (GetRecoveryPauseState() != > > > RECOVERY_NOT_PAUSED) > > > { > > > HandleStartupProcInterrupts(); > > > > > > > > > > > > If a user call pg_wal_replay_pause while waiting in > > > RecoveryRequiresIntParameter, > > > the state become 'pause requested' and this never returns to 'paused'. > > > Should we check recoveryPauseState in this loop as in > > > > > > I think the right fix should be that the state should never go from > > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause should take > > care of that. > > It makes sense to take care of this in pg_wal_replay_pause, but I wonder > it can not handle the case that a user resume and pause again while a sleep. Right, we will have to check and set in the loop. But we should not allow the state to go from paused to pause requested irrespective of this. I will make these changes and send the updated patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Mon, 8 Feb 2021 07:51:22 +0530 Dilip Kumar wrote: > On Mon, 8 Feb 2021 at 6:38 AM, Yugo NAGATA wrote: > > > Hi, > > > > On Sun, 7 Feb 2021 19:27:02 +0530 > > Dilip Kumar wrote: > > > > > On Sun, Feb 7, 2021 at 6:44 PM Bharath Rupireddy > > > wrote: > > > > > > > > On Fri, Feb 5, 2021 at 10:14 AM Bharath Rupireddy > > > > wrote: > > > > > > We can not do that, basically, under one lock we need to check the > > > > > > state and set it to pause. Because by the time you release the > > lock > > > > > > someone might set it to RECOVERY_NOT_PAUSED then you don't want to > > set > > > > > > it to RECOVERY_PAUSED. > > > > > > > > > > Got it. Thanks. > > > > > > > > Hi Dilip, I have one more question: > > > > > > > > +/* test for recovery pause, if user has requested the pause */ > > > > +if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState == > > > > +RECOVERY_PAUSE_REQUESTED) > > > > +recoveryPausesHere(false); > > > > + > > > > +now = GetCurrentTimestamp(); > > > > + > > > > > > > > Do we need now = GetCurrentTimestamp(); here? Because, I see that > > > > whenever the variable now is used within the for loop in > > > > WaitForWALToBecomeAvailable, it's re-calculated anyways. It's being > > > > used within case XLOG_FROM_STREAM: > > > > > > > > Am I missing something? > > > > > > Yeah, I don't see any reason for doing this, maybe it got copy pasted > > > by mistake. Thanks for observing this. > > > > I also have a question: > > > > @@ -6270,14 +6291,14 @@ RecoveryRequiresIntParameter(const char > > *param_name, int currValue, int minValue > >currValue, > >minValue))); > > > > - SetRecoveryPause(true); > > + SetRecoveryPause(RECOVERY_PAUSED); > > > > ereport(LOG, > > (errmsg("recovery has paused"), > > errdetail("If recovery is > > unpaused, the server will shut down."), > > errhint("You can then restart the > > server after making the necessary configuration changes."))); > > > > - while (RecoveryIsPaused()) > > + while (GetRecoveryPauseState() != > > RECOVERY_NOT_PAUSED) > > { > > HandleStartupProcInterrupts(); > > > > > > > > If a user call pg_wal_replay_pause while waiting in > > RecoveryRequiresIntParameter, > > the state become 'pause requested' and this never returns to 'paused'. > > Should we check recoveryPauseState in this loop as in > > > I think the right fix should be that the state should never go from > ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause should take > care of that. It makes sense to take care of this in pg_wal_replay_pause, but I wonder it can not handle the case that a user resume and pause again while a sleep. -- Yugo NAGATA
Re: Is Recovery actually paused?
On Mon, 8 Feb 2021 at 6:38 AM, Yugo NAGATA wrote: > Hi, > > On Sun, 7 Feb 2021 19:27:02 +0530 > Dilip Kumar wrote: > > > On Sun, Feb 7, 2021 at 6:44 PM Bharath Rupireddy > > wrote: > > > > > > On Fri, Feb 5, 2021 at 10:14 AM Bharath Rupireddy > > > wrote: > > > > > We can not do that, basically, under one lock we need to check the > > > > > state and set it to pause. Because by the time you release the > lock > > > > > someone might set it to RECOVERY_NOT_PAUSED then you don't want to > set > > > > > it to RECOVERY_PAUSED. > > > > > > > > Got it. Thanks. > > > > > > Hi Dilip, I have one more question: > > > > > > +/* test for recovery pause, if user has requested the pause */ > > > +if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState == > > > +RECOVERY_PAUSE_REQUESTED) > > > +recoveryPausesHere(false); > > > + > > > +now = GetCurrentTimestamp(); > > > + > > > > > > Do we need now = GetCurrentTimestamp(); here? Because, I see that > > > whenever the variable now is used within the for loop in > > > WaitForWALToBecomeAvailable, it's re-calculated anyways. It's being > > > used within case XLOG_FROM_STREAM: > > > > > > Am I missing something? > > > > Yeah, I don't see any reason for doing this, maybe it got copy pasted > > by mistake. Thanks for observing this. > > I also have a question: > > @@ -6270,14 +6291,14 @@ RecoveryRequiresIntParameter(const char > *param_name, int currValue, int minValue >currValue, >minValue))); > > - SetRecoveryPause(true); > + SetRecoveryPause(RECOVERY_PAUSED); > > ereport(LOG, > (errmsg("recovery has paused"), > errdetail("If recovery is > unpaused, the server will shut down."), > errhint("You can then restart the > server after making the necessary configuration changes."))); > > - while (RecoveryIsPaused()) > + while (GetRecoveryPauseState() != > RECOVERY_NOT_PAUSED) > { > HandleStartupProcInterrupts(); > > > > If a user call pg_wal_replay_pause while waiting in > RecoveryRequiresIntParameter, > the state become 'pause requested' and this never returns to 'paused'. > Should we check recoveryPauseState in this loop as in I think the right fix should be that the state should never go from ‘paused’ to ‘pause requested’ so I think pg_wal_replay_pause should take care of that. > -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
Hi, On Sun, 7 Feb 2021 19:27:02 +0530 Dilip Kumar wrote: > On Sun, Feb 7, 2021 at 6:44 PM Bharath Rupireddy > wrote: > > > > On Fri, Feb 5, 2021 at 10:14 AM Bharath Rupireddy > > wrote: > > > > We can not do that, basically, under one lock we need to check the > > > > state and set it to pause. Because by the time you release the lock > > > > someone might set it to RECOVERY_NOT_PAUSED then you don't want to set > > > > it to RECOVERY_PAUSED. > > > > > > Got it. Thanks. > > > > Hi Dilip, I have one more question: > > > > +/* test for recovery pause, if user has requested the pause */ > > +if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState == > > +RECOVERY_PAUSE_REQUESTED) > > +recoveryPausesHere(false); > > + > > +now = GetCurrentTimestamp(); > > + > > > > Do we need now = GetCurrentTimestamp(); here? Because, I see that > > whenever the variable now is used within the for loop in > > WaitForWALToBecomeAvailable, it's re-calculated anyways. It's being > > used within case XLOG_FROM_STREAM: > > > > Am I missing something? > > Yeah, I don't see any reason for doing this, maybe it got copy pasted > by mistake. Thanks for observing this. I also have a question: @@ -6270,14 +6291,14 @@ RecoveryRequiresIntParameter(const char *param_name, int currValue, int minValue currValue, minValue))); - SetRecoveryPause(true); + SetRecoveryPause(RECOVERY_PAUSED); ereport(LOG, (errmsg("recovery has paused"), errdetail("If recovery is unpaused, the server will shut down."), errhint("You can then restart the server after making the necessary configuration changes."))); - while (RecoveryIsPaused()) + while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED) { HandleStartupProcInterrupts(); If a user call pg_wal_replay_pause while waiting in RecoveryRequiresIntParameter, the state become 'pause requested' and this never returns to 'paused'. Should we check recoveryPauseState in this loop as in recoveryPausesHere? Regards, Yugo Nagata -- Yugo NAGATA
Re: Is Recovery actually paused?
On Sun, Feb 7, 2021 at 6:44 PM Bharath Rupireddy wrote: > > On Fri, Feb 5, 2021 at 10:14 AM Bharath Rupireddy > wrote: > > > We can not do that, basically, under one lock we need to check the > > > state and set it to pause. Because by the time you release the lock > > > someone might set it to RECOVERY_NOT_PAUSED then you don't want to set > > > it to RECOVERY_PAUSED. > > > > Got it. Thanks. > > Hi Dilip, I have one more question: > > +/* test for recovery pause, if user has requested the pause */ > +if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState == > +RECOVERY_PAUSE_REQUESTED) > +recoveryPausesHere(false); > + > +now = GetCurrentTimestamp(); > + > > Do we need now = GetCurrentTimestamp(); here? Because, I see that > whenever the variable now is used within the for loop in > WaitForWALToBecomeAvailable, it's re-calculated anyways. It's being > used within case XLOG_FROM_STREAM: > > Am I missing something? Yeah, I don't see any reason for doing this, maybe it got copy pasted by mistake. Thanks for observing this. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com From b5655514681317d997d8245068237acf5c6ada9d Mon Sep 17 00:00:00 2001 From: Dilip Kumar Date: Wed, 27 Jan 2021 16:46:04 +0530 Subject: [PATCH v12] Provide a new interface to get the recovery pause status Currently, pg_is_wal_replay_paused, just checks whether the recovery pause is requested or not but it doesn't actually tell whether the recovery is actually paused or not. So basically there is no way for the user to know the actual status of the pause request. This patch provides a new interface pg_get_wal_replay_pause_state that will return the actual status of the recovery pause i.e.'not paused' if pause is not requested, 'pause requested' if pause is requested but recovery is not yet paused and 'paused' if recovery is actually paused. --- doc/src/sgml/func.sgml | 32 +++--- src/backend/access/transam/xlog.c | 62 -- src/backend/access/transam/xlogfuncs.c | 50 --- src/include/access/xlog.h | 12 +-- src/include/catalog/pg_proc.dat| 4 +++ 5 files changed, 132 insertions(+), 28 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 1ab31a9..9b2c429 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25285,7 +25285,24 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); boolean -Returns true if recovery is paused. +Returns true if recovery pause is requested. + + + + + + + pg_get_wal_replay_pause_state + +pg_get_wal_replay_pause_state () +text + + +Returns recovery pause state. The return values are +not paused if pause is not requested, +pause requested if pause is requested but recovery is +not yet paused and, paused if the recovery is +actually paused. @@ -25324,10 +25341,15 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); void -Pauses recovery. While recovery is paused, no further database -changes are applied. If hot standby is active, all new queries will -see the same consistent snapshot of the database, and no further query -conflicts will be generated until recovery is resumed. +Request to pause recovery. A request doesn't mean that recovery stops +right away. If you want a guarantee that recovery is actually paused, +you need to check for the recovery pause state returned by +pg_get_wal_replay_pause_state(). Note that +pg_is_wal_replay_paused() returns whether a request +is made. While recovery is paused, no further database changes are applied. +If hot standby is active, all new queries will see the same consistent +snapshot of the database, and no further query conflicts will be generated +until recovery is resumed. This function is restricted to superusers by default, but other users diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 8e3b5df..aa89e52 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -721,8 +721,8 @@ typedef struct XLogCtlData * only relevant for replication or archive recovery */ TimestampTz currentChunkStartTime; - /* Are we requested to pause recovery? */ - bool recoveryPause; + /* Recovery pause state */ + RecoveryPauseState recoveryPauseState; /* * lastFpwDisableRecPtr points to the start of the last replayed @@ -6019,7 +6019,7 @@ recoveryStopsAfter(XLogReaderState *record) } /* - * Wait until shared recoveryPause flag is cleared. + * Wait until shared recoveryPauseState is set to RECOVERY_NOT
Re: Is Recovery actually paused?
On Fri, Feb 5, 2021 at 10:14 AM Bharath Rupireddy wrote: > > We can not do that, basically, under one lock we need to check the > > state and set it to pause. Because by the time you release the lock > > someone might set it to RECOVERY_NOT_PAUSED then you don't want to set > > it to RECOVERY_PAUSED. > > Got it. Thanks. Hi Dilip, I have one more question: +/* test for recovery pause, if user has requested the pause */ +if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState == +RECOVERY_PAUSE_REQUESTED) +recoveryPausesHere(false); + +now = GetCurrentTimestamp(); + Do we need now = GetCurrentTimestamp(); here? Because, I see that whenever the variable now is used within the for loop in WaitForWALToBecomeAvailable, it's re-calculated anyways. It's being used within case XLOG_FROM_STREAM: Am I missing something? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Fri, Feb 5, 2021 at 10:06 AM Dilip Kumar wrote: > > On Fri, Feb 5, 2021 at 6:22 AM Bharath Rupireddy > wrote: > > > > On Thu, Feb 4, 2021 at 7:20 PM Dilip Kumar wrote: > > > > How can we do that this is not a 1 byte flag this is enum so I don't > > > > think we can read any atomic state without a spin lock here. > > > > > > I have fixed the other comments and the updated patch is attached. > > > > Can we just do like below so that we could use the existing > > SetRecoveryPause instead of setting the state outside? > > > > /* loop until recoveryPauseState is set to RECOVERY_NOT_PAUSED */ > > while (1) > > { > > RecoveryPauseState state; > > > > state = GetRecoveryPauseState(); > > > > if (state == RECOVERY_NOT_PAUSED) > > break; > > > > HandleStartupProcInterrupts(); > > > > if (CheckForStandbyTrigger()) > > return; > > pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE); > > > > /* > > * If recovery pause is requested then set it paused. While we are > > in > > * the loop, user might resume and pause again so set this every > > time. > > */ > > if (state == RECOVERY_PAUSE_REQUESTED) > > SetRecoveryPause(RECOVERY_PAUSED) > > We can not do that, basically, under one lock we need to check the > state and set it to pause. Because by the time you release the lock > someone might set it to RECOVERY_NOT_PAUSED then you don't want to set > it to RECOVERY_PAUSED. Got it. Thanks. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Fri, Feb 5, 2021 at 6:22 AM Bharath Rupireddy wrote: > > On Thu, Feb 4, 2021 at 7:20 PM Dilip Kumar wrote: > > > How can we do that this is not a 1 byte flag this is enum so I don't > > > think we can read any atomic state without a spin lock here. > > > > I have fixed the other comments and the updated patch is attached. > > Can we just do like below so that we could use the existing > SetRecoveryPause instead of setting the state outside? > > /* loop until recoveryPauseState is set to RECOVERY_NOT_PAUSED */ > while (1) > { > RecoveryPauseState state; > > state = GetRecoveryPauseState(); > > if (state == RECOVERY_NOT_PAUSED) > break; > > HandleStartupProcInterrupts(); > > if (CheckForStandbyTrigger()) > return; > pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE); > > /* > * If recovery pause is requested then set it paused. While we are in > * the loop, user might resume and pause again so set this every time. > */ > if (state == RECOVERY_PAUSE_REQUESTED) > SetRecoveryPause(RECOVERY_PAUSED) We can not do that, basically, under one lock we need to check the state and set it to pause. Because by the time you release the lock someone might set it to RECOVERY_NOT_PAUSED then you don't want to set it to RECOVERY_PAUSED. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Thu, Feb 4, 2021 at 10:19 PM Robert Haas wrote: > > On Thu, Feb 4, 2021 at 7:46 AM Dilip Kumar wrote: > > How can we do that this is not a 1 byte flag this is enum so I don't > > think we can read any atomic state without a spin lock here. > > I think this discussion of atomics is confused. Let's talk about what > atomic reads and writes mean. Imagine that you have a 64-bit value > 0x0101010101010101. Somebody sets it to 0x0202020202020202. Imagine > that just as they are doing that, someone else reads the value and > gets 0x0202020201010101, because half of the value has been updated > and the other half has not yet been updated yet. This kind of thing > can actually happen on some platforms and what it means is that on > those platforms 8-byte reads and writes are not atomic. The idea of an > "atom" is that it can't be split into pieces but these reads and > writes on some platforms are actually not "atomic" because they are > split into two 4-byte pieces. But there's no such thing as a 1-byte > read or write not being atomic. In theory you could imagine a computer > where when you change 0x01 to 0x23 and read in the middle and see 0x21 > or 0x13 or something, but no actual computers behave that way, or at > least no mainstream ones that anybody cares about. So the idea that > you somehow need a lock to prevent this is just wrong. > > Concurrent programs also suffer from another problem which is > reordering of operations, which can happen either as the program is > compiled or as the program is executed by the CPU. The CPU can see you > set a->x = 1 and a->y = 2 and decide to update y first and then x even > though you wrote it the other way around in the program text. To > prevent this, we have barrier operations; see README.barrier in the > source tree for a longer explanation. Atomic operations like > compare-and-exchange are also full barriers, so that they not only > prevent the torn read/write problem described above, but also enforce > order of operations more strictly. > > Now I don't know whether a lock is needed here or not. Maybe it is; > perhaps for consistency with other code, perhaps because the lock > acquire and release is serving the function of a barrier; or perhaps > to guard against some other hazard. But saying that it's because > reading or writing a 1-byte value might not be atomic does not sound > correct. I never told that reading /writing 1 byte is not atomic, of course, they are. I told that we can only guarantee that 1-byte read/write is atomic but this variable is not a bool or 1-byte value and the enum can take 32 bits on a 32-bit platform so we can not guarantee the atomic read/write on some processor so we need a lock. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Thu, Feb 4, 2021 at 7:20 PM Dilip Kumar wrote: > > How can we do that this is not a 1 byte flag this is enum so I don't > > think we can read any atomic state without a spin lock here. > > I have fixed the other comments and the updated patch is attached. Can we just do like below so that we could use the existing SetRecoveryPause instead of setting the state outside? /* loop until recoveryPauseState is set to RECOVERY_NOT_PAUSED */ while (1) { RecoveryPauseState state; state = GetRecoveryPauseState(); if (state == RECOVERY_NOT_PAUSED) break; HandleStartupProcInterrupts(); if (CheckForStandbyTrigger()) return; pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE); /* * If recovery pause is requested then set it paused. While we are in * the loop, user might resume and pause again so set this every time. */ if (state == RECOVERY_PAUSE_REQUESTED) SetRecoveryPause(RECOVERY_PAUSED) And a typo - it's "pg_is_wal_replay_paused" not "pg_is_wal_repay_paused" + pg_is_wal_repay_paused() returns whether a request With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Thu, Feb 4, 2021 at 7:46 AM Dilip Kumar wrote: > How can we do that this is not a 1 byte flag this is enum so I don't > think we can read any atomic state without a spin lock here. I think this discussion of atomics is confused. Let's talk about what atomic reads and writes mean. Imagine that you have a 64-bit value 0x0101010101010101. Somebody sets it to 0x0202020202020202. Imagine that just as they are doing that, someone else reads the value and gets 0x0202020201010101, because half of the value has been updated and the other half has not yet been updated yet. This kind of thing can actually happen on some platforms and what it means is that on those platforms 8-byte reads and writes are not atomic. The idea of an "atom" is that it can't be split into pieces but these reads and writes on some platforms are actually not "atomic" because they are split into two 4-byte pieces. But there's no such thing as a 1-byte read or write not being atomic. In theory you could imagine a computer where when you change 0x01 to 0x23 and read in the middle and see 0x21 or 0x13 or something, but no actual computers behave that way, or at least no mainstream ones that anybody cares about. So the idea that you somehow need a lock to prevent this is just wrong. Concurrent programs also suffer from another problem which is reordering of operations, which can happen either as the program is compiled or as the program is executed by the CPU. The CPU can see you set a->x = 1 and a->y = 2 and decide to update y first and then x even though you wrote it the other way around in the program text. To prevent this, we have barrier operations; see README.barrier in the source tree for a longer explanation. Atomic operations like compare-and-exchange are also full barriers, so that they not only prevent the torn read/write problem described above, but also enforce order of operations more strictly. Now I don't know whether a lock is needed here or not. Maybe it is; perhaps for consistency with other code, perhaps because the lock acquire and release is serving the function of a barrier; or perhaps to guard against some other hazard. But saying that it's because reading or writing a 1-byte value might not be atomic does not sound correct. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Thu, Feb 4, 2021 at 6:16 PM Dilip Kumar wrote: > > On Thu, Feb 4, 2021 at 4:58 PM Bharath Rupireddy > wrote: > > > > On Thu, Feb 4, 2021 at 10:28 AM Dilip Kumar wrote: > > > Please find an updated patch which addresses these comments. > > > > Thanks for the patch. I tested the new function > > pg_get_wal_replay_pause_state: > > > > postgres=# select pg_get_wal_replay_pause_state(); > > pg_get_wal_replay_pause_state > > --- > > not paused > > postgres=# select pg_wal_replay_pause(); > > pg_wal_replay_pause > > - > > > > (1 row) > > > > I can also see the "pause requested" state after I put a gdb > > breakpoint in WaitForWALToBecomeAvailable in the standby startup > > process . > > > > postgres=# select pg_get_wal_replay_pause_state(); > > pg_get_wal_replay_pause_state > > --- > > pause requested > > (1 row) > > > > postgres=# select pg_get_wal_replay_pause_state(); > > pg_get_wal_replay_pause_state > > --- > > paused > > (1 row) > > > > Mostly, the v10 patch looks good to me, except below minor comments: > > > > 1) A typo in commit message - "just check" --> "just checks" > > > > 2) How about > > +Returns recovery pause state. The return values are not > > paused > > instead of > > +Returns recovery pause state, the return values are not > > paused > > > > 3) I think it is 'get wal replay pause state', instead of { oid => > > '1137', descr => 'get wal replay is pause state', > > > > 4) can we just do this > > /* > > * If recovery pause is requested then set it paused. While we are > > in > > * the loop, user might resume and pause again so set this every > > time. > > */ > > if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState == > > RECOVERY_PAUSE_REQUESTED) > > SetRecoveryPause(RECOVERY_PAUSED); > > instead of > > /* > > * If recovery pause is requested then set it paused. While we are > > in > > * the loop, user might resume and pause again so set this every > > time. > > */ > > SpinLockAcquire(&XLogCtl->info_lck); > > if (XLogCtl->recoveryPauseState == RECOVERY_PAUSE_REQUESTED) > > XLogCtl->recoveryPauseState = RECOVERY_PAUSED; > > SpinLockRelease(&XLogCtl->info_lck); > > > > I think it's okay, since we take a spinlock anyways in > > GetRecoveryPauseState(). See the below comment and also a relevant > > commit 6ba4ecbf477e0b25dd7bde1b0c4e07fc2da19348 on why it's not > > necessary taking spinlock always: > > /* > > * Pause WAL replay, if requested by a hot-standby session > > via > > * SetRecoveryPause(). > > * > > * Note that we intentionally don't take the info_lck > > spinlock > > * here. We might therefore read a slightly stale value of > > * the recoveryPause flag, but it can't be very stale (no > > * worse than the last spinlock we did acquire). Since a > > * pause request is a pretty asynchronous thing anyway, > > * possibly responding to it one WAL record later than we > > * otherwise would is a minor issue, so it doesn't seem > > worth > > * adding another spinlock cycle to prevent that. > > */ > > How can we do that this is not a 1 byte flag this is enum so I don't > think we can read any atomic state without a spin lock here. I have fixed the other comments and the updated patch is attached. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com From 0d7059a8809cbd28e7a4668c6b99e0c297c564f3 Mon Sep 17 00:00:00 2001 From: Dilip Kumar Date: Wed, 27 Jan 2021 16:46:04 +0530 Subject: [PATCH v11] Provide a new interface to get the recovery pause status Currently, pg_is_wal_replay_paused, just checks whether the recovery pause is requested or not but it doesn't actually tell whether the recovery is actually paused or not. So basically there is no way for the user to know the actual status of the pause request. This patch provides a new interface pg_get_wal_replay_pause_state that will return the actual status of the recovery pause i.e.'not paused' if pause is not requested, 'pause requested' if pause is requested but recovery is not yet paused and 'paused' if recovery is actually paused. --- doc/src/sgml/func.sgml | 32 ++--- src/backend/access/transam/xlog.c | 64 +- src/backend/access/transam/xlogfuncs.c | 50 +++--- src/include/access/xlog.h | 12 +-- src/include/catalog/pg_proc.dat| 4 +++ 5 files changed, 134 insertions(+), 28 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index f30eaa3..eb131a6 100644 --- a/doc/src/sgml/func.sgml
Re: Is Recovery actually paused?
On Thu, Feb 4, 2021 at 4:58 PM Bharath Rupireddy wrote: > > On Thu, Feb 4, 2021 at 10:28 AM Dilip Kumar wrote: > > Please find an updated patch which addresses these comments. > > Thanks for the patch. I tested the new function pg_get_wal_replay_pause_state: > > postgres=# select pg_get_wal_replay_pause_state(); > pg_get_wal_replay_pause_state > --- > not paused > postgres=# select pg_wal_replay_pause(); > pg_wal_replay_pause > - > > (1 row) > > I can also see the "pause requested" state after I put a gdb > breakpoint in WaitForWALToBecomeAvailable in the standby startup > process . > > postgres=# select pg_get_wal_replay_pause_state(); > pg_get_wal_replay_pause_state > --- > pause requested > (1 row) > > postgres=# select pg_get_wal_replay_pause_state(); > pg_get_wal_replay_pause_state > --- > paused > (1 row) > > Mostly, the v10 patch looks good to me, except below minor comments: > > 1) A typo in commit message - "just check" --> "just checks" > > 2) How about > +Returns recovery pause state. The return values are not > paused > instead of > +Returns recovery pause state, the return values are not > paused > > 3) I think it is 'get wal replay pause state', instead of { oid => > '1137', descr => 'get wal replay is pause state', > > 4) can we just do this > /* > * If recovery pause is requested then set it paused. While we are in > * the loop, user might resume and pause again so set this every time. > */ > if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState == > RECOVERY_PAUSE_REQUESTED) > SetRecoveryPause(RECOVERY_PAUSED); > instead of > /* > * If recovery pause is requested then set it paused. While we are in > * the loop, user might resume and pause again so set this every time. > */ > SpinLockAcquire(&XLogCtl->info_lck); > if (XLogCtl->recoveryPauseState == RECOVERY_PAUSE_REQUESTED) > XLogCtl->recoveryPauseState = RECOVERY_PAUSED; > SpinLockRelease(&XLogCtl->info_lck); > > I think it's okay, since we take a spinlock anyways in > GetRecoveryPauseState(). See the below comment and also a relevant > commit 6ba4ecbf477e0b25dd7bde1b0c4e07fc2da19348 on why it's not > necessary taking spinlock always: > /* > * Pause WAL replay, if requested by a hot-standby session via > * SetRecoveryPause(). > * > * Note that we intentionally don't take the info_lck spinlock > * here. We might therefore read a slightly stale value of > * the recoveryPause flag, but it can't be very stale (no > * worse than the last spinlock we did acquire). Since a > * pause request is a pretty asynchronous thing anyway, > * possibly responding to it one WAL record later than we > * otherwise would is a minor issue, so it doesn't seem worth > * adding another spinlock cycle to prevent that. > */ How can we do that this is not a 1 byte flag this is enum so I don't think we can read any atomic state without a spin lock here. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Thu, Feb 4, 2021 at 4:58 PM Bharath Rupireddy wrote: > > On Thu, Feb 4, 2021 at 10:28 AM Dilip Kumar wrote: > > Please find an updated patch which addresses these comments. > > Thanks for the patch. I tested the new function pg_get_wal_replay_pause_state: > > postgres=# select pg_get_wal_replay_pause_state(); > pg_get_wal_replay_pause_state > --- > not paused > postgres=# select pg_wal_replay_pause(); > pg_wal_replay_pause > - > > (1 row) > > I can also see the "pause requested" state after I put a gdb > breakpoint in WaitForWALToBecomeAvailable in the standby startup > process . > > postgres=# select pg_get_wal_replay_pause_state(); > pg_get_wal_replay_pause_state > --- > pause requested > (1 row) > > postgres=# select pg_get_wal_replay_pause_state(); > pg_get_wal_replay_pause_state > --- > paused > (1 row) > > Mostly, the v10 patch looks good to me, except below minor comments: Thanks for the testing. > 1) A typo in commit message - "just check" --> "just checks" > > 2) How about > +Returns recovery pause state. The return values are not > paused > instead of > +Returns recovery pause state, the return values are not > paused > > 3) I think it is 'get wal replay pause state', instead of { oid => > '1137', descr => 'get wal replay is pause state', > > 4) can we just do this > /* > * If recovery pause is requested then set it paused. While we are in > * the loop, user might resume and pause again so set this every time. > */ > if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState == > RECOVERY_PAUSE_REQUESTED) > SetRecoveryPause(RECOVERY_PAUSED); > instead of > /* > * If recovery pause is requested then set it paused. While we are in > * the loop, user might resume and pause again so set this every time. > */ > SpinLockAcquire(&XLogCtl->info_lck); > if (XLogCtl->recoveryPauseState == RECOVERY_PAUSE_REQUESTED) > XLogCtl->recoveryPauseState = RECOVERY_PAUSED; > SpinLockRelease(&XLogCtl->info_lck); > > I think it's okay, since we take a spinlock anyways in > GetRecoveryPauseState(). See the below comment and also a relevant > commit 6ba4ecbf477e0b25dd7bde1b0c4e07fc2da19348 on why it's not > necessary taking spinlock always: > /* > * Pause WAL replay, if requested by a hot-standby session via > * SetRecoveryPause(). > * > * Note that we intentionally don't take the info_lck spinlock > * here. We might therefore read a slightly stale value of > * the recoveryPause flag, but it can't be very stale (no > * worse than the last spinlock we did acquire). Since a > * pause request is a pretty asynchronous thing anyway, > * possibly responding to it one WAL record later than we > * otherwise would is a minor issue, so it doesn't seem worth > * adding another spinlock cycle to prevent that. > */ > I will work on these comments and send the updated patch soon. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Thu, Feb 4, 2021 at 10:28 AM Dilip Kumar wrote: > Please find an updated patch which addresses these comments. Thanks for the patch. I tested the new function pg_get_wal_replay_pause_state: postgres=# select pg_get_wal_replay_pause_state(); pg_get_wal_replay_pause_state --- not paused postgres=# select pg_wal_replay_pause(); pg_wal_replay_pause - (1 row) I can also see the "pause requested" state after I put a gdb breakpoint in WaitForWALToBecomeAvailable in the standby startup process . postgres=# select pg_get_wal_replay_pause_state(); pg_get_wal_replay_pause_state --- pause requested (1 row) postgres=# select pg_get_wal_replay_pause_state(); pg_get_wal_replay_pause_state --- paused (1 row) Mostly, the v10 patch looks good to me, except below minor comments: 1) A typo in commit message - "just check" --> "just checks" 2) How about +Returns recovery pause state. The return values are not paused instead of +Returns recovery pause state, the return values are not paused 3) I think it is 'get wal replay pause state', instead of { oid => '1137', descr => 'get wal replay is pause state', 4) can we just do this /* * If recovery pause is requested then set it paused. While we are in * the loop, user might resume and pause again so set this every time. */ if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState == RECOVERY_PAUSE_REQUESTED) SetRecoveryPause(RECOVERY_PAUSED); instead of /* * If recovery pause is requested then set it paused. While we are in * the loop, user might resume and pause again so set this every time. */ SpinLockAcquire(&XLogCtl->info_lck); if (XLogCtl->recoveryPauseState == RECOVERY_PAUSE_REQUESTED) XLogCtl->recoveryPauseState = RECOVERY_PAUSED; SpinLockRelease(&XLogCtl->info_lck); I think it's okay, since we take a spinlock anyways in GetRecoveryPauseState(). See the below comment and also a relevant commit 6ba4ecbf477e0b25dd7bde1b0c4e07fc2da19348 on why it's not necessary taking spinlock always: /* * Pause WAL replay, if requested by a hot-standby session via * SetRecoveryPause(). * * Note that we intentionally don't take the info_lck spinlock * here. We might therefore read a slightly stale value of * the recoveryPause flag, but it can't be very stale (no * worse than the last spinlock we did acquire). Since a * pause request is a pretty asynchronous thing anyway, * possibly responding to it one WAL record later than we * otherwise would is a minor issue, so it doesn't seem worth * adding another spinlock cycle to prevent that. */ With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Mon, Feb 1, 2021 at 1:41 PM Dilip Kumar wrote: > > On Mon, Feb 1, 2021 at 11:59 AM Kyotaro Horiguchi > wrote: > > > > At Sun, 31 Jan 2021 11:24:30 +0530, Dilip Kumar > > wrote in > > > On Fri, Jan 29, 2021 at 4:33 PM Dilip Kumar wrote: > > > > > > > > On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA wrote: > > > > > > > > > > On Thu, 28 Jan 2021 09:55:42 +0530 > > > > > Dilip Kumar wrote: > > > > > > > > > > > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar > > > > > > wrote: > > > > > > > > > > > > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA > > > > > > > wrote: > > > > > > > > > > > > > > > > On Wed, 27 Jan 2021 13:29:23 +0530 > > > > > > > > Dilip Kumar wrote: > > > > > > > > > > > > > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy > > > > > > > > > > > wrote: > > > > > > > > > > > > +1 to just show the recovery pause state in the output > > > > > > > > > > > > of > > > > > > > > > > > > pg_is_wal_replay_paused. But, should the function name > > > > > > > > > > > > "pg_is_wal_replay_paused" be something like > > > > > > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, > > > > > > > > > > > > when "is" exists > > > > > > > > > > > > in a function, I expect a boolean output. Others may > > > > > > > > > > > > have better > > > > > > > > > > > > thoughts. > > > > > > > > > > > > > > > > > > > > > > Maybe we should leave the existing function > > > > > > > > > > > pg_is_wal_replay_paused() > > > > > > > > > > > alone and add a new one with the name you suggest that > > > > > > > > > > > returns text. > > > > > > > > > > > That would create less burden for tool authors. > > > > > > > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yeah, we can do that, I will send an updated patch soon. > > > > > > > > > > > > > > > > This means pg_is_wal_replay_paused is left without any change > > > > > > > > and this > > > > > > > > returns whether pause is requested or not? If so, it seems good > > > > > > > > to modify > > > > > > > > the documentation of this function in order to note that this > > > > > > > > could not > > > > > > > > return the actual pause state. > > > > > > > > > > > > > > Yes, we can say that it will return true if the replay pause is > > > > > > > requested. I am changing that in my new patch. > > > > > > > > > > > > I have modified the patch, changes > > > > > > > > > > > > - I have added a new interface pg_get_wal_replay_pause_state to get > > > > > > the pause request state > > > > > > - Now, we are not waiting for the recovery to actually get paused > > > > > > so I > > > > > > think it doesn't make sense to put a lot of checkpoints to check the > > > > > > pause requested so I have removed that check from the > > > > > > recoveryApplyDelay but I think it better we still keep that check in > > > > > > the WaitForWalToBecomeAvailable because it can wait forever before > > > > > > the > > > > > > next wal get available. > > > > > > > > > > I think basically the check in WaitForWalToBecomeAvailable is > > > > > independent > > > > > of the feature of pg_get_wal_replay_pause_state, that is, reporting > > > > > the > > > > > actual pause state. This function could just return 'pause requested' > > > > > if a pause is requested during waiting for WAL. > > > > > > > > > > However, I agree the change to allow recovery to transit the state to > > > > > 'paused' during WAL waiting because 'paused' has more useful > > > > > information > > > > > for users than 'pause requested'. Returning 'paused' lets users know > > > > > clearly that no more WAL are applied until recovery is resumed. On > > > > > the > > > > > other hand, when 'pause requested' is returned, user can't say whether > > > > > the next WAL wiill be applied or not from this information. > > > > > > > > > > For the same reason, I think it is also useful to call > > > > > recoveryPausesHere > > > > > in recoveryApplyDelay. > > > > > > > > IMHO the WaitForWalToBecomeAvailable can wait until the next wal get > > > > available so it can not be controlled by user so it is good to put a > > > > check for the recovery pause, however recoveryApplyDelay wait for the > > > > apply delay which is configured by user and it is predictable value by > > > > the user. I don't have much objection to putting that check in the > > > > recoveryApplyDelay as well but I feel it is not necessary. Any other > > > > thoughts on this? > > > > > > > > > In addition, in RecoveryRequiresIntParameter, recovery should get > > > > > paused > > > > > if a parameter value has a problem. However, > > > > > pg_get_wal_replay_pause_state > > > > > will return 'pause requested' in this case. So, I think, we should > > > > > pass > > > > > RECOVERY_P
Re: Is Recovery actually paused?
On Mon, Feb 1, 2021 at 11:59 AM Kyotaro Horiguchi wrote: > > At Sun, 31 Jan 2021 11:24:30 +0530, Dilip Kumar wrote > in > > On Fri, Jan 29, 2021 at 4:33 PM Dilip Kumar wrote: > > > > > > On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA wrote: > > > > > > > > On Thu, 28 Jan 2021 09:55:42 +0530 > > > > Dilip Kumar wrote: > > > > > > > > > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar > > > > > wrote: > > > > > > > > > > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA > > > > > > wrote: > > > > > > > > > > > > > > On Wed, 27 Jan 2021 13:29:23 +0530 > > > > > > > Dilip Kumar wrote: > > > > > > > > > > > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy > > > > > > > > > > wrote: > > > > > > > > > > > +1 to just show the recovery pause state in the output of > > > > > > > > > > > pg_is_wal_replay_paused. But, should the function name > > > > > > > > > > > "pg_is_wal_replay_paused" be something like > > > > > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, > > > > > > > > > > > when "is" exists > > > > > > > > > > > in a function, I expect a boolean output. Others may have > > > > > > > > > > > better > > > > > > > > > > > thoughts. > > > > > > > > > > > > > > > > > > > > Maybe we should leave the existing function > > > > > > > > > > pg_is_wal_replay_paused() > > > > > > > > > > alone and add a new one with the name you suggest that > > > > > > > > > > returns text. > > > > > > > > > > That would create less burden for tool authors. > > > > > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > > > > > > > Yeah, we can do that, I will send an updated patch soon. > > > > > > > > > > > > > > This means pg_is_wal_replay_paused is left without any change and > > > > > > > this > > > > > > > returns whether pause is requested or not? If so, it seems good > > > > > > > to modify > > > > > > > the documentation of this function in order to note that this > > > > > > > could not > > > > > > > return the actual pause state. > > > > > > > > > > > > Yes, we can say that it will return true if the replay pause is > > > > > > requested. I am changing that in my new patch. > > > > > > > > > > I have modified the patch, changes > > > > > > > > > > - I have added a new interface pg_get_wal_replay_pause_state to get > > > > > the pause request state > > > > > - Now, we are not waiting for the recovery to actually get paused so I > > > > > think it doesn't make sense to put a lot of checkpoints to check the > > > > > pause requested so I have removed that check from the > > > > > recoveryApplyDelay but I think it better we still keep that check in > > > > > the WaitForWalToBecomeAvailable because it can wait forever before the > > > > > next wal get available. > > > > > > > > I think basically the check in WaitForWalToBecomeAvailable is > > > > independent > > > > of the feature of pg_get_wal_replay_pause_state, that is, reporting the > > > > actual pause state. This function could just return 'pause requested' > > > > if a pause is requested during waiting for WAL. > > > > > > > > However, I agree the change to allow recovery to transit the state to > > > > 'paused' during WAL waiting because 'paused' has more useful information > > > > for users than 'pause requested'. Returning 'paused' lets users know > > > > clearly that no more WAL are applied until recovery is resumed. On the > > > > other hand, when 'pause requested' is returned, user can't say whether > > > > the next WAL wiill be applied or not from this information. > > > > > > > > For the same reason, I think it is also useful to call > > > > recoveryPausesHere > > > > in recoveryApplyDelay. > > > > > > IMHO the WaitForWalToBecomeAvailable can wait until the next wal get > > > available so it can not be controlled by user so it is good to put a > > > check for the recovery pause, however recoveryApplyDelay wait for the > > > apply delay which is configured by user and it is predictable value by > > > the user. I don't have much objection to putting that check in the > > > recoveryApplyDelay as well but I feel it is not necessary. Any other > > > thoughts on this? > > > > > > > In addition, in RecoveryRequiresIntParameter, recovery should get paused > > > > if a parameter value has a problem. However, > > > > pg_get_wal_replay_pause_state > > > > will return 'pause requested' in this case. So, I think, we should pass > > > > RECOVERY_PAUSED to SetRecoveryPause() instead of > > > > RECOVERY_PAUSE_REQUESTED, > > > > or call CheckAndSetRecoveryPause() in the loop like > > > > recoveryPausesHere(). > > > > > > Yeah, absolutely right, it must pass RECOVERY_PAUSED. I will change > > > this, thanks for noticing this. > > > > I have changed this in the new patch. > > It seems to work well. The
Re: Is Recovery actually paused?
At Sun, 31 Jan 2021 11:24:30 +0530, Dilip Kumar wrote in > On Fri, Jan 29, 2021 at 4:33 PM Dilip Kumar wrote: > > > > On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA wrote: > > > > > > On Thu, 28 Jan 2021 09:55:42 +0530 > > > Dilip Kumar wrote: > > > > > > > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar > > > > wrote: > > > > > > > > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA > > > > > wrote: > > > > > > > > > > > > On Wed, 27 Jan 2021 13:29:23 +0530 > > > > > > Dilip Kumar wrote: > > > > > > > > > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada > > > > > > > wrote: > > > > > > > > > > > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy > > > > > > > > > wrote: > > > > > > > > > > +1 to just show the recovery pause state in the output of > > > > > > > > > > pg_is_wal_replay_paused. But, should the function name > > > > > > > > > > "pg_is_wal_replay_paused" be something like > > > > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when > > > > > > > > > > "is" exists > > > > > > > > > > in a function, I expect a boolean output. Others may have > > > > > > > > > > better > > > > > > > > > > thoughts. > > > > > > > > > > > > > > > > > > Maybe we should leave the existing function > > > > > > > > > pg_is_wal_replay_paused() > > > > > > > > > alone and add a new one with the name you suggest that > > > > > > > > > returns text. > > > > > > > > > That would create less burden for tool authors. > > > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > > > > Yeah, we can do that, I will send an updated patch soon. > > > > > > > > > > > > This means pg_is_wal_replay_paused is left without any change and > > > > > > this > > > > > > returns whether pause is requested or not? If so, it seems good to > > > > > > modify > > > > > > the documentation of this function in order to note that this could > > > > > > not > > > > > > return the actual pause state. > > > > > > > > > > Yes, we can say that it will return true if the replay pause is > > > > > requested. I am changing that in my new patch. > > > > > > > > I have modified the patch, changes > > > > > > > > - I have added a new interface pg_get_wal_replay_pause_state to get > > > > the pause request state > > > > - Now, we are not waiting for the recovery to actually get paused so I > > > > think it doesn't make sense to put a lot of checkpoints to check the > > > > pause requested so I have removed that check from the > > > > recoveryApplyDelay but I think it better we still keep that check in > > > > the WaitForWalToBecomeAvailable because it can wait forever before the > > > > next wal get available. > > > > > > I think basically the check in WaitForWalToBecomeAvailable is independent > > > of the feature of pg_get_wal_replay_pause_state, that is, reporting the > > > actual pause state. This function could just return 'pause requested' > > > if a pause is requested during waiting for WAL. > > > > > > However, I agree the change to allow recovery to transit the state to > > > 'paused' during WAL waiting because 'paused' has more useful information > > > for users than 'pause requested'. Returning 'paused' lets users know > > > clearly that no more WAL are applied until recovery is resumed. On the > > > other hand, when 'pause requested' is returned, user can't say whether > > > the next WAL wiill be applied or not from this information. > > > > > > For the same reason, I think it is also useful to call recoveryPausesHere > > > in recoveryApplyDelay. > > > > IMHO the WaitForWalToBecomeAvailable can wait until the next wal get > > available so it can not be controlled by user so it is good to put a > > check for the recovery pause, however recoveryApplyDelay wait for the > > apply delay which is configured by user and it is predictable value by > > the user. I don't have much objection to putting that check in the > > recoveryApplyDelay as well but I feel it is not necessary. Any other > > thoughts on this? > > > > > In addition, in RecoveryRequiresIntParameter, recovery should get paused > > > if a parameter value has a problem. However, > > > pg_get_wal_replay_pause_state > > > will return 'pause requested' in this case. So, I think, we should pass > > > RECOVERY_PAUSED to SetRecoveryPause() instead of RECOVERY_PAUSE_REQUESTED, > > > or call CheckAndSetRecoveryPause() in the loop like recoveryPausesHere(). > > > > Yeah, absolutely right, it must pass RECOVERY_PAUSED. I will change > > this, thanks for noticing this. > > I have changed this in the new patch. It seems to work well. The checkpoints seems to be placed properly. +SetRecoveryPause(RecoveryPauseState state) { + Assert(state >= RECOVERY_NOT_PAUSED && state <= RECOVERY_PAUSED); I'm not sure that state worth FATAL. Isn't it enough to just ERROR out like XLogFileRead? CheckAndSetRecovery() has only one caller. I
Re: Is Recovery actually paused?
On Fri, Jan 29, 2021 at 4:33 PM Dilip Kumar wrote: > > On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA wrote: > > > > On Thu, 28 Jan 2021 09:55:42 +0530 > > Dilip Kumar wrote: > > > > > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar wrote: > > > > > > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA wrote: > > > > > > > > > > On Wed, 27 Jan 2021 13:29:23 +0530 > > > > > Dilip Kumar wrote: > > > > > > > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada > > > > > > wrote: > > > > > > > > > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas > > > > > > > wrote: > > > > > > > > > > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy > > > > > > > > wrote: > > > > > > > > > +1 to just show the recovery pause state in the output of > > > > > > > > > pg_is_wal_replay_paused. But, should the function name > > > > > > > > > "pg_is_wal_replay_paused" be something like > > > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when > > > > > > > > > "is" exists > > > > > > > > > in a function, I expect a boolean output. Others may have > > > > > > > > > better > > > > > > > > > thoughts. > > > > > > > > > > > > > > > > Maybe we should leave the existing function > > > > > > > > pg_is_wal_replay_paused() > > > > > > > > alone and add a new one with the name you suggest that returns > > > > > > > > text. > > > > > > > > That would create less burden for tool authors. > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > Yeah, we can do that, I will send an updated patch soon. > > > > > > > > > > This means pg_is_wal_replay_paused is left without any change and this > > > > > returns whether pause is requested or not? If so, it seems good to > > > > > modify > > > > > the documentation of this function in order to note that this could > > > > > not > > > > > return the actual pause state. > > > > > > > > Yes, we can say that it will return true if the replay pause is > > > > requested. I am changing that in my new patch. > > > > > > I have modified the patch, changes > > > > > > - I have added a new interface pg_get_wal_replay_pause_state to get > > > the pause request state > > > - Now, we are not waiting for the recovery to actually get paused so I > > > think it doesn't make sense to put a lot of checkpoints to check the > > > pause requested so I have removed that check from the > > > recoveryApplyDelay but I think it better we still keep that check in > > > the WaitForWalToBecomeAvailable because it can wait forever before the > > > next wal get available. > > > > I think basically the check in WaitForWalToBecomeAvailable is independent > > of the feature of pg_get_wal_replay_pause_state, that is, reporting the > > actual pause state. This function could just return 'pause requested' > > if a pause is requested during waiting for WAL. > > > > However, I agree the change to allow recovery to transit the state to > > 'paused' during WAL waiting because 'paused' has more useful information > > for users than 'pause requested'. Returning 'paused' lets users know > > clearly that no more WAL are applied until recovery is resumed. On the > > other hand, when 'pause requested' is returned, user can't say whether > > the next WAL wiill be applied or not from this information. > > > > For the same reason, I think it is also useful to call recoveryPausesHere > > in recoveryApplyDelay. > > IMHO the WaitForWalToBecomeAvailable can wait until the next wal get > available so it can not be controlled by user so it is good to put a > check for the recovery pause, however recoveryApplyDelay wait for the > apply delay which is configured by user and it is predictable value by > the user. I don't have much objection to putting that check in the > recoveryApplyDelay as well but I feel it is not necessary. Any other > thoughts on this? > > > In addition, in RecoveryRequiresIntParameter, recovery should get paused > > if a parameter value has a problem. However, pg_get_wal_replay_pause_state > > will return 'pause requested' in this case. So, I think, we should pass > > RECOVERY_PAUSED to SetRecoveryPause() instead of RECOVERY_PAUSE_REQUESTED, > > or call CheckAndSetRecoveryPause() in the loop like recoveryPausesHere(). > > Yeah, absolutely right, it must pass RECOVERY_PAUSED. I will change > this, thanks for noticing this. I have changed this in the new patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com v9-0001-Provide-a-new-interface-to-get-the-recovery-pause.patch Description: Binary data
Re: Is Recovery actually paused?
On Fri, 29 Jan 2021 16:33:32 +0530 Dilip Kumar wrote: > On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA wrote: > > > > On Thu, 28 Jan 2021 09:55:42 +0530 > > Dilip Kumar wrote: > > > > > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar wrote: > > > > > > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA wrote: > > > > > > > > > > On Wed, 27 Jan 2021 13:29:23 +0530 > > > > > Dilip Kumar wrote: > > > > > > > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada > > > > > > wrote: > > > > > > > > > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas > > > > > > > wrote: > > > > > > > > > > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy > > > > > > > > wrote: > > > > > > > > > +1 to just show the recovery pause state in the output of > > > > > > > > > pg_is_wal_replay_paused. But, should the function name > > > > > > > > > "pg_is_wal_replay_paused" be something like > > > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when > > > > > > > > > "is" exists > > > > > > > > > in a function, I expect a boolean output. Others may have > > > > > > > > > better > > > > > > > > > thoughts. > > > > > > > > > > > > > > > > Maybe we should leave the existing function > > > > > > > > pg_is_wal_replay_paused() > > > > > > > > alone and add a new one with the name you suggest that returns > > > > > > > > text. > > > > > > > > That would create less burden for tool authors. > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > Yeah, we can do that, I will send an updated patch soon. > > > > > > > > > > This means pg_is_wal_replay_paused is left without any change and this > > > > > returns whether pause is requested or not? If so, it seems good to > > > > > modify > > > > > the documentation of this function in order to note that this could > > > > > not > > > > > return the actual pause state. > > > > > > > > Yes, we can say that it will return true if the replay pause is > > > > requested. I am changing that in my new patch. > > > > > > I have modified the patch, changes > > > > > > - I have added a new interface pg_get_wal_replay_pause_state to get > > > the pause request state > > > - Now, we are not waiting for the recovery to actually get paused so I > > > think it doesn't make sense to put a lot of checkpoints to check the > > > pause requested so I have removed that check from the > > > recoveryApplyDelay but I think it better we still keep that check in > > > the WaitForWalToBecomeAvailable because it can wait forever before the > > > next wal get available. > > > > I think basically the check in WaitForWalToBecomeAvailable is independent > > of the feature of pg_get_wal_replay_pause_state, that is, reporting the > > actual pause state. This function could just return 'pause requested' > > if a pause is requested during waiting for WAL. > > > > However, I agree the change to allow recovery to transit the state to > > 'paused' during WAL waiting because 'paused' has more useful information > > for users than 'pause requested'. Returning 'paused' lets users know > > clearly that no more WAL are applied until recovery is resumed. On the > > other hand, when 'pause requested' is returned, user can't say whether > > the next WAL wiill be applied or not from this information. > > > > For the same reason, I think it is also useful to call recoveryPausesHere > > in recoveryApplyDelay. > > IMHO the WaitForWalToBecomeAvailable can wait until the next wal get > available so it can not be controlled by user so it is good to put a > check for the recovery pause, however recoveryApplyDelay wait for the > apply delay which is configured by user and it is predictable value by > the user. I don't have much objection to putting that check in the > recoveryApplyDelay as well but I feel it is not necessary. Any other > thoughts on this? I'm not sure if the user can figure out easily that the reason why pg_get_wal_replay_pause_state returns 'pause requested' is due to recovery_min_apply_delay because it would needs knowledge of the internal mechanism of recovery. However, if there are not any other opinions of it, I don't care that recoveryApplyDelay is left as is because such check and state transition is independent of the goal of pg_get_wal_replay_pause_state itself as I mentioned above. Regards, Yugo Nagata -- Yugo NAGATA
Re: Is Recovery actually paused?
On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA wrote: > > On Thu, 28 Jan 2021 09:55:42 +0530 > Dilip Kumar wrote: > > > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar wrote: > > > > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA wrote: > > > > > > > > On Wed, 27 Jan 2021 13:29:23 +0530 > > > > Dilip Kumar wrote: > > > > > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada > > > > > wrote: > > > > > > > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas > > > > > > wrote: > > > > > > > > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy > > > > > > > wrote: > > > > > > > > +1 to just show the recovery pause state in the output of > > > > > > > > pg_is_wal_replay_paused. But, should the function name > > > > > > > > "pg_is_wal_replay_paused" be something like > > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" > > > > > > > > exists > > > > > > > > in a function, I expect a boolean output. Others may have better > > > > > > > > thoughts. > > > > > > > > > > > > > > Maybe we should leave the existing function > > > > > > > pg_is_wal_replay_paused() > > > > > > > alone and add a new one with the name you suggest that returns > > > > > > > text. > > > > > > > That would create less burden for tool authors. > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > Yeah, we can do that, I will send an updated patch soon. > > > > > > > > This means pg_is_wal_replay_paused is left without any change and this > > > > returns whether pause is requested or not? If so, it seems good to > > > > modify > > > > the documentation of this function in order to note that this could not > > > > return the actual pause state. > > > > > > Yes, we can say that it will return true if the replay pause is > > > requested. I am changing that in my new patch. > > > > I have modified the patch, changes > > > > - I have added a new interface pg_get_wal_replay_pause_state to get > > the pause request state > > - Now, we are not waiting for the recovery to actually get paused so I > > think it doesn't make sense to put a lot of checkpoints to check the > > pause requested so I have removed that check from the > > recoveryApplyDelay but I think it better we still keep that check in > > the WaitForWalToBecomeAvailable because it can wait forever before the > > next wal get available. > > I think basically the check in WaitForWalToBecomeAvailable is independent > of the feature of pg_get_wal_replay_pause_state, that is, reporting the > actual pause state. This function could just return 'pause requested' > if a pause is requested during waiting for WAL. > > However, I agree the change to allow recovery to transit the state to > 'paused' during WAL waiting because 'paused' has more useful information > for users than 'pause requested'. Returning 'paused' lets users know > clearly that no more WAL are applied until recovery is resumed. On the > other hand, when 'pause requested' is returned, user can't say whether > the next WAL wiill be applied or not from this information. > > For the same reason, I think it is also useful to call recoveryPausesHere > in recoveryApplyDelay. IMHO the WaitForWalToBecomeAvailable can wait until the next wal get available so it can not be controlled by user so it is good to put a check for the recovery pause, however recoveryApplyDelay wait for the apply delay which is configured by user and it is predictable value by the user. I don't have much objection to putting that check in the recoveryApplyDelay as well but I feel it is not necessary. Any other thoughts on this? > In addition, in RecoveryRequiresIntParameter, recovery should get paused > if a parameter value has a problem. However, pg_get_wal_replay_pause_state > will return 'pause requested' in this case. So, I think, we should pass > RECOVERY_PAUSED to SetRecoveryPause() instead of RECOVERY_PAUSE_REQUESTED, > or call CheckAndSetRecoveryPause() in the loop like recoveryPausesHere(). Yeah, absolutely right, it must pass RECOVERY_PAUSED. I will change this, thanks for noticing this. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Thu, 28 Jan 2021 09:55:42 +0530 Dilip Kumar wrote: > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar wrote: > > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA wrote: > > > > > > On Wed, 27 Jan 2021 13:29:23 +0530 > > > Dilip Kumar wrote: > > > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada > > > > wrote: > > > > > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas > > > > > wrote: > > > > > > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy > > > > > > wrote: > > > > > > > +1 to just show the recovery pause state in the output of > > > > > > > pg_is_wal_replay_paused. But, should the function name > > > > > > > "pg_is_wal_replay_paused" be something like > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" > > > > > > > exists > > > > > > > in a function, I expect a boolean output. Others may have better > > > > > > > thoughts. > > > > > > > > > > > > Maybe we should leave the existing function > > > > > > pg_is_wal_replay_paused() > > > > > > alone and add a new one with the name you suggest that returns text. > > > > > > That would create less burden for tool authors. > > > > > > > > > > +1 > > > > > > > > > > > > > Yeah, we can do that, I will send an updated patch soon. > > > > > > This means pg_is_wal_replay_paused is left without any change and this > > > returns whether pause is requested or not? If so, it seems good to modify > > > the documentation of this function in order to note that this could not > > > return the actual pause state. > > > > Yes, we can say that it will return true if the replay pause is > > requested. I am changing that in my new patch. > > I have modified the patch, changes > > - I have added a new interface pg_get_wal_replay_pause_state to get > the pause request state > - Now, we are not waiting for the recovery to actually get paused so I > think it doesn't make sense to put a lot of checkpoints to check the > pause requested so I have removed that check from the > recoveryApplyDelay but I think it better we still keep that check in > the WaitForWalToBecomeAvailable because it can wait forever before the > next wal get available. I think basically the check in WaitForWalToBecomeAvailable is independent of the feature of pg_get_wal_replay_pause_state, that is, reporting the actual pause state. This function could just return 'pause requested' if a pause is requested during waiting for WAL. However, I agree the change to allow recovery to transit the state to 'paused' during WAL waiting because 'paused' has more useful information for users than 'pause requested'. Returning 'paused' lets users know clearly that no more WAL are applied until recovery is resumed. On the other hand, when 'pause requested' is returned, user can't say whether the next WAL wiill be applied or not from this information. For the same reason, I think it is also useful to call recoveryPausesHere in recoveryApplyDelay. In addition, in RecoveryRequiresIntParameter, recovery should get paused if a parameter value has a problem. However, pg_get_wal_replay_pause_state will return 'pause requested' in this case. So, I think, we should pass RECOVERY_PAUSED to SetRecoveryPause() instead of RECOVERY_PAUSE_REQUESTED, or call CheckAndSetRecoveryPause() in the loop like recoveryPausesHere(). Regrads, Yugo Nagata -- Yugo NAGATA
Re: Is Recovery actually paused?
On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar wrote: > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA wrote: > > > > On Wed, 27 Jan 2021 13:29:23 +0530 > > Dilip Kumar wrote: > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada > > > wrote: > > > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas > > > > wrote: > > > > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy > > > > > wrote: > > > > > > +1 to just show the recovery pause state in the output of > > > > > > pg_is_wal_replay_paused. But, should the function name > > > > > > "pg_is_wal_replay_paused" be something like > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" > > > > > > exists > > > > > > in a function, I expect a boolean output. Others may have better > > > > > > thoughts. > > > > > > > > > > Maybe we should leave the existing function pg_is_wal_replay_paused() > > > > > alone and add a new one with the name you suggest that returns text. > > > > > That would create less burden for tool authors. > > > > > > > > +1 > > > > > > > > > > Yeah, we can do that, I will send an updated patch soon. > > > > This means pg_is_wal_replay_paused is left without any change and this > > returns whether pause is requested or not? If so, it seems good to modify > > the documentation of this function in order to note that this could not > > return the actual pause state. > > Yes, we can say that it will return true if the replay pause is > requested. I am changing that in my new patch. I have modified the patch, changes - I have added a new interface pg_get_wal_replay_pause_state to get the pause request state - Now, we are not waiting for the recovery to actually get paused so I think it doesn't make sense to put a lot of checkpoints to check the pause requested so I have removed that check from the recoveryApplyDelay but I think it better we still keep that check in the WaitForWalToBecomeAvailable because it can wait forever before the next wal get available. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com v8-0001-Provide-a-new-interface-to-get-the-recovery-pause.patch Description: Binary data
Re: Is Recovery actually paused?
On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA wrote: > > On Wed, 27 Jan 2021 13:29:23 +0530 > Dilip Kumar wrote: > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada > > wrote: > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas wrote: > > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy > > > > wrote: > > > > > +1 to just show the recovery pause state in the output of > > > > > pg_is_wal_replay_paused. But, should the function name > > > > > "pg_is_wal_replay_paused" be something like > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists > > > > > in a function, I expect a boolean output. Others may have better > > > > > thoughts. > > > > > > > > Maybe we should leave the existing function pg_is_wal_replay_paused() > > > > alone and add a new one with the name you suggest that returns text. > > > > That would create less burden for tool authors. > > > > > > +1 > > > > > > > Yeah, we can do that, I will send an updated patch soon. > > This means pg_is_wal_replay_paused is left without any change and this > returns whether pause is requested or not? If so, it seems good to modify > the documentation of this function in order to note that this could not > return the actual pause state. Yes, we can say that it will return true if the replay pause is requested. I am changing that in my new patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Wed, 27 Jan 2021 13:29:23 +0530 Dilip Kumar wrote: > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada > wrote: > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas wrote: > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy > > > wrote: > > > > +1 to just show the recovery pause state in the output of > > > > pg_is_wal_replay_paused. But, should the function name > > > > "pg_is_wal_replay_paused" be something like > > > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists > > > > in a function, I expect a boolean output. Others may have better > > > > thoughts. > > > > > > Maybe we should leave the existing function pg_is_wal_replay_paused() > > > alone and add a new one with the name you suggest that returns text. > > > That would create less burden for tool authors. > > > > +1 > > > > Yeah, we can do that, I will send an updated patch soon. This means pg_is_wal_replay_paused is left without any change and this returns whether pause is requested or not? If so, it seems good to modify the documentation of this function in order to note that this could not return the actual pause state. Regards, Yugo Nagata -- Yugo NAGATA
Re: Is Recovery actually paused?
On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada wrote: > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas wrote: > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy > > wrote: > > > +1 to just show the recovery pause state in the output of > > > pg_is_wal_replay_paused. But, should the function name > > > "pg_is_wal_replay_paused" be something like > > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists > > > in a function, I expect a boolean output. Others may have better > > > thoughts. > > > > Maybe we should leave the existing function pg_is_wal_replay_paused() > > alone and add a new one with the name you suggest that returns text. > > That would create less burden for tool authors. > > +1 > Yeah, we can do that, I will send an updated patch soon. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Tue, Jan 26, 2021 at 2:00 AM Robert Haas wrote: > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy > wrote: > > +1 to just show the recovery pause state in the output of > > pg_is_wal_replay_paused. But, should the function name > > "pg_is_wal_replay_paused" be something like > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists > > in a function, I expect a boolean output. Others may have better > > thoughts. > > Maybe we should leave the existing function pg_is_wal_replay_paused() > alone and add a new one with the name you suggest that returns text. > That would create less burden for tool authors. +1 Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Is Recovery actually paused?
On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy wrote: > +1 to just show the recovery pause state in the output of > pg_is_wal_replay_paused. But, should the function name > "pg_is_wal_replay_paused" be something like > "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists > in a function, I expect a boolean output. Others may have better > thoughts. Maybe we should leave the existing function pg_is_wal_replay_paused() alone and add a new one with the name you suggest that returns text. That would create less burden for tool authors. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Mon, 25 Jan 2021 14:53:18 +0530 Dilip Kumar wrote: > I have changed as per other functions for consistency. Thank you for updating the patch. Here are a few comments: (1) - SetRecoveryPause(true); + SetRecoveryPause(RECOVERY_PAUSE_REQUESTED); ereport(LOG (errmsg("recovery has paused"), errdetail("If recovery is unpaused, the server will shut down."), errhint("You can then restart the server after making the necessary configuration changes."))); - while (RecoveryIsPaused()) + while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED) { HandleStartupProcInterrupts(); This fix would be required for code added by the following commit. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=15251c0a60be76eedee74ac0e94b433f9acca5af Due to this, the recovery could get paused after the configuration change in the primary. However, after applying this patch, pg_is_wal_replay_paused returns "pause requested" although it should return "paused". To fix this, we must pass RECOVERY_PAUSED to SetRecoveryPause() instead of RECOVERY_PAUSE_REQUESTED. Or, we can call CheckAndSetRecoveryPause() in the loop like recoveryPausesHere(), but this seems redundant. (2) - while (RecoveryIsPaused()) + while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED) { + HandleStartupProcInterrupts(); Though it is trivial, I think the new line after the brace is unnecessary. Regards, Yugo Nagata -- Yugo NAGATA
Re: Is Recovery actually paused?
On Mon, Jan 25, 2021 at 2:53 PM Dilip Kumar wrote: > I have changed as per other functions for consistency. Thanks for the v7 patch. Here are some quick comments on it: [1] I think we need to change return value from boolean to text in documentation: pg_is_wal_replay_paused pg_is_wal_replay_paused () boolean [2] Do we intentionally ignore the return type of below function? If yes, can we change the return type to void and change the function comment? If we do care about the return value, shouldn't we use it? static bool recoveryApplyDelay(XLogReaderState *record); +recoveryApplyDelay(xlogreader); [3] Although it's not necessary, I just thought, it will be good to have an example for the new output of pg_is_wal_replay_paused in the documentation, something like below for brin_page_type: test=# SELECT brin_page_type(get_raw_page('brinidx', 0)); brin_page_type meta With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Sun, Jan 24, 2021 at 12:17 PM Bharath Rupireddy wrote: > > On Sun, Jan 24, 2021 at 11:29 AM Dilip Kumar wrote: > > > Some comments on the v6 patch: > > > > [2] Typo - it's "requested" + * 'paused requested' - if pause is > > > reqested but recovery is not yet paused > > Here I meant the typo "reqested" in "if pause is reqested but recovery > is not yet paused" statement from v6 patch. Ok > > > [3] I think it's "+ * 'pause requested'" instead of "+ * 'paused > > > requested'" > > > > Which code does it refer, can give put the snippet from the patch. > > However, I have found there were 'paused requested' in two places so I > > have fixed. > > Thanks. > > > > [6] As I mentioned upthread, isn't it better to have > > > "IsRecoveryPaused(void)" than "RecoveryIsPaused(void)"? > > > > That is an existing function so I think it's fine to keep the same name. > > Personally, I think the function RecoveryIsPaused itself is > unnecessary with the new function GetRecoveryPauseState introduced in > your patch. IMHO, we can remove it. If not okay, then we are at it, > can we at least change the function name to be meaningful > "IsRecoveryPaused"? Others may have better thoughts than me. I have removed this function > > > [7] Can we have the function variable name "recoveryPause" as "state" > > > or "pauseState? Because that variable context is set by the enum name > > > RecoveryPauseState and the function name. > > > > > > +SetRecoveryPause(RecoveryPauseState recoveryPause) > > > > > > Here as well, "recoveryPauseState" to "state"? > > > +GetRecoveryPauseState(void) > > > { > > > -boolrecoveryPause; > > > +RecoveryPauseStaterecoveryPauseState; > > > > I don't think it is required but while changing the patch I will see > > whether to change or not. > > It will be good to change that. I personally don't like structure > names and variable names to be the same. Changed to state > > > [6] Function name RecoveryIsPaused and it's comment "Check whether the > > > recovery pause is requested." doesn't seem to be matching. Seems like > > > it returns true even when RECOVERY_PAUSE_REQUESTED or RECOVERY_PAUSED. > > > Should it return true only when the state is RECOVERY_PAUSE_REQUESTED? > > > > Code is doing right, I will change the comments. > > > > > Instead of "while (RecoveryIsPaused())", can't we change it to "while > > > (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)" and remove the > > > RecoveryIsPaused()? > > > > I think it looks clean with the function > > As I said earlier, I see no use of RecoveryIsPaused() with the > introduction of the new function GetRecoveryPauseState(). Others may > have better thoughts than me. > > > > [7] Can we change the switch-case in pg_is_wal_replay_paused to > > > something like below? > > > > > > Datum > > > pg_is_wal_replay_paused(PG_FUNCTION_ARGS) > > > { > > > +char *state; > > > +/* get the recovery pause state */ > > > +switch(GetRecoveryPauseState()) > > > +{ > > > +case RECOVERY_NOT_PAUSED: > > > +state = "not paused"; > > > +case RECOVERY_PAUSE_REQUESTED: > > > +state = "paused requested"; > > > +case RECOVERY_PAUSED: > > > +state = "paused"; > > > +default: > > > +elog(ERROR, "invalid recovery pause state"); > > > +} > > > + > > > +PG_RETURN_TEXT_P(cstring_to_text(type)); > > > > Why do you think it is better to use an extra variable? > > I see no wrong in having PG_RETURN_TEXT_P and cstring_to_text 's in > every case statement. But, just to make sure the code looks cleaner, I > said that we can have a local state variable and just one > PG_RETURN_TEXT_P(cstring_to_text(state));. See some existing functions > brin_page_type, hash_page_type, json_typeof, > pg_stat_get_backend_activity, pg_stat_get_backend_wait_event_type, > pg_stat_get_backend_wait_event, get_command_type. > I have changed as per other functions for consistency. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com From 5a42c06f866f4e52e136ac46092b3e7f75e9c3ec Mon Sep 17 00:00:00 2001 From: Dilip Kumar Date: Sat, 23 Jan 2021 11:59:58 +0530 Subject: [PATCH v7] pg_is_wal_replay_paused will return the status of recovery pause Currently, pg_is_wal_replay_paused, just check whether the recovery pause is requested or not but it doesn't actually tell whether the recovery is actually paused or not. So basically there is no way for the user to know the actual status of the pause request. With this patch the API will return the actual status of the recovery pause i.e. 'not paused' if pause is not requested 'pause requested' if pause is requested but recovery is not yet paused and 'paused' if recovery is actually paused. --- doc/src/sgml/func.sgml | 13 +++-- src/backend/access/transam/xlog.c | 88 +++--- src/backend/access/transam/xlogfuncs.c | 35 -- src/include/access/xlog.h | 12 -
Re: Is Recovery actually paused?
At Mon, 25 Jan 2021 10:05:19 +0530, Dilip Kumar wrote in > On Mon, Jan 25, 2021 at 8:42 AM Kyotaro Horiguchi > wrote: > > > > At Sun, 24 Jan 2021 14:26:08 +0530, Dilip Kumar > > wrote in > > > On Sun, Jan 24, 2021 at 12:16 PM Bharath Rupireddy > > > wrote: > > > > > > > > On Sun, Jan 24, 2021 at 7:17 AM Dilip Kumar > > > > wrote: > > > > > On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy > > > > > wrote: > > > > >> > > > > >> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar > > > > >> wrote: > > > > >> > Please find the patch for the same. I haven't added a test case > > > > >> > for > > > > >> > this yet. I mean we can write a test case to pause the recovery > > > > >> > and > > > > >> > get the status. But I am not sure that we can really write a > > > > >> > reliable > > > > >> > test case for 'pause requested' and 'paused'. > > > > >> > > > > >> +1 to just show the recovery pause state in the output of > > > > >> pg_is_wal_replay_paused. But, should the function name > > > > >> "pg_is_wal_replay_paused" be something like > > > > >> "pg_get_wal_replay_pause_state" or some other? To me, when "is" > > > > >> exists > > > > >> in a function, I expect a boolean output. Others may have better > > > > >> thoughts. > > > > >> > > > > >> IIUC the above change, ensuring the recovery is paused after it's > > > > >> requested lies with the user. IMHO, the main problem we are trying to > > > > >> solve is not met > > > > > > > > > > > > > > > Basically earlier their was no way for the user yo know whether the > > > > > recovery is actually paused or not because it was always returning > > > > > true after pause requested. Now, we will return whether pause > > > > > requested or actually paused. So > for tool designer who want to > > > > > wait for recovery to get paused can have a loop and wait until the > > > > > recovery state reaches to paused. That will give a better control. > > > > > > > > I get it and I agree to have that change. My point was whether we can > > > > have a new function pg_wal_replay_pause_and_wait that waits until > > > > recovery is actually paused ((along with pg_is_wal_replay_paused > > > > returning the actual state than a true/false) so that tool developers > > > > don't need to have the waiting code outside, if at all they care about > > > > it? Others may have better thoughts than me. > > > > > > I think the previous patch was based on that idea where we thought > > > that we can pass an argument to pg_is_wal_replay_paused which can > > > decide whether to wait or return without the wait. I think this > > > version looks better to me where we give the status instead of > > > waiting. I am not sure whether we want another version of > > > pg_wal_replay_pause which will wait for actually it to get paused. I > > > mean there is always a scope to include the functionality in the > > > database which can be achieved by the tool, but this patch was trying > > > to solve the problem that there was no way to know the status so I > > > think returning the correct status should be the scope of this. > > > > I understand that the requirement here is that no record is replayed > > after pg_wal_replay_pause() is returned, or pg_is_wal_replay_paused() > > returns true, and delays taken while recovery don't delay the state > > change. That requirements are really synchronous. > > > > On the other hand the machinery is designed to be asynchronous. > > > > >* Note that we intentionally don't take the info_lck spinlock > > >* here. We might therefore read a slightly stale value of > > >* the recoveryPause flag, but it can't be very stale (no > > >* worse than the last spinlock we did acquire). Since a > > >* pause request is a pretty asynchronous thing anyway, > > >* possibly responding to it one WAL record later than we > > >* otherwise would is a minor issue, so it doesn't seem worth > > >* adding another spinlock cycle to prevent that. > > > > As the result, this patch tries to introduce several new checkpoints > > to some delaying point so that that waits can find pause request in a > > timely manner. I think we had better use locking (or atomics) for the > > information instead of such scattered checkpoints if we expect that > > machinery to work in a such syhcnronous manner. > > > > That would make the tri-state state variable and many checkpoints > > unnecessary. Maybe. > > I don't think the intention was so to make it synchronous, I think > the main intention was that pg_is_wal_replay_paused can return us the > correct state, in short user can know that whether any more wal will > be replayed after pg_is_wal_replay_paused returns true or some other > state. I agree that along with that we have also introduced some I meant that kind of correctness in a time-series by using the word "synchronous". So it can be implemented both by adopting many checkpoints and by just makeing the state-change synchronou
Re: Is Recovery actually paused?
On Mon, Jan 25, 2021 at 8:42 AM Kyotaro Horiguchi wrote: > > At Sun, 24 Jan 2021 14:26:08 +0530, Dilip Kumar wrote > in > > On Sun, Jan 24, 2021 at 12:16 PM Bharath Rupireddy > > wrote: > > > > > > On Sun, Jan 24, 2021 at 7:17 AM Dilip Kumar wrote: > > > > On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy > > > > wrote: > > > >> > > > >> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar > > > >> wrote: > > > >> > Please find the patch for the same. I haven't added a test case for > > > >> > this yet. I mean we can write a test case to pause the recovery and > > > >> > get the status. But I am not sure that we can really write a > > > >> > reliable > > > >> > test case for 'pause requested' and 'paused'. > > > >> > > > >> +1 to just show the recovery pause state in the output of > > > >> pg_is_wal_replay_paused. But, should the function name > > > >> "pg_is_wal_replay_paused" be something like > > > >> "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists > > > >> in a function, I expect a boolean output. Others may have better > > > >> thoughts. > > > >> > > > >> IIUC the above change, ensuring the recovery is paused after it's > > > >> requested lies with the user. IMHO, the main problem we are trying to > > > >> solve is not met > > > > > > > > > > > > Basically earlier their was no way for the user yo know whether the > > > > recovery is actually paused or not because it was always returning true > > > > after pause requested. Now, we will return whether pause requested or > > > > actually paused. So > for tool designer who want to wait for recovery > > > > to get paused can have a loop and wait until the recovery state reaches > > > > to paused. That will give a better control. > > > > > > I get it and I agree to have that change. My point was whether we can > > > have a new function pg_wal_replay_pause_and_wait that waits until > > > recovery is actually paused ((along with pg_is_wal_replay_paused > > > returning the actual state than a true/false) so that tool developers > > > don't need to have the waiting code outside, if at all they care about > > > it? Others may have better thoughts than me. > > > > I think the previous patch was based on that idea where we thought > > that we can pass an argument to pg_is_wal_replay_paused which can > > decide whether to wait or return without the wait. I think this > > version looks better to me where we give the status instead of > > waiting. I am not sure whether we want another version of > > pg_wal_replay_pause which will wait for actually it to get paused. I > > mean there is always a scope to include the functionality in the > > database which can be achieved by the tool, but this patch was trying > > to solve the problem that there was no way to know the status so I > > think returning the correct status should be the scope of this. > > I understand that the requirement here is that no record is replayed > after pg_wal_replay_pause() is returned, or pg_is_wal_replay_paused() > returns true, and delays taken while recovery don't delay the state > change. That requirements are really synchronous. > > On the other hand the machinery is designed to be asynchronous. > > >* Note that we intentionally don't take the info_lck spinlock > >* here. We might therefore read a slightly stale value of > >* the recoveryPause flag, but it can't be very stale (no > >* worse than the last spinlock we did acquire). Since a > >* pause request is a pretty asynchronous thing anyway, > >* possibly responding to it one WAL record later than we > >* otherwise would is a minor issue, so it doesn't seem worth > >* adding another spinlock cycle to prevent that. > > As the result, this patch tries to introduce several new checkpoints > to some delaying point so that that waits can find pause request in a > timely manner. I think we had better use locking (or atomics) for the > information instead of such scattered checkpoints if we expect that > machinery to work in a such syhcnronous manner. > > That would make the tri-state state variable and many checkpoints > unnecessary. Maybe. I don't think the intention was so to make it synchronous, I think the main intention was that pg_is_wal_replay_paused can return us the correct state, in short user can know that whether any more wal will be replayed after pg_is_wal_replay_paused returns true or some other state. I agree that along with that we have also introduced some extra checkpoint where the recovery process is waiting for WAL and apply delay and from the pg_wal_replay_pause we had sent a signal to wakeup the recovery process. So I am not sure is it worth adding the lock/atomic variable to make this synchronous. Any other thoughts on this? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Mon, Jan 25, 2021 at 6:12 AM Masahiko Sawada wrote: > > On Sun, Jan 17, 2021 at 5:19 PM Dilip Kumar wrote: > > > > On Sat, Jan 16, 2021 at 8:59 AM Masahiko Sawada > > wrote: > > > > > > On Wed, Jan 13, 2021 at 9:20 PM Dilip Kumar wrote: > > > > > > > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar > > > > wrote: > > > > > > > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA > > > > > wrote: > > > > > > > > > > > > On Thu, 10 Dec 2020 11:25:23 +0530 > > > > > > Dilip Kumar wrote: > > > > > > > > > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused > > > > > > > > > to wait. > > > > > > > > > Especially, if max_standby_streaming_delay is -1, this will > > > > > > > > > be blocked forever, > > > > > > > > > although this setting may not be usual. In addition, some > > > > > > > > > users may set > > > > > > > > > recovery_min_apply_delay for a large. If such users call > > > > > > > > > pg_is_wal_replay_paused, > > > > > > > > > it could wait for a long time. > > > > > > > > > > > > > > > > > > At least, I think we need some descriptions on document to > > > > > > > > > explain > > > > > > > > > pg_is_wal_replay_paused could wait while a time. > > > > > > > > > > > > > > > > Ok > > > > > > > > > > > > > > Fixed this, added some comments in .sgml as well as in function > > > > > > > header > > > > > > > > > > > > Thank you for fixing this. > > > > > > > > > > > > Also, is it better to fix the description of pg_wal_replay_pause > > > > > > from > > > > > > "Pauses recovery." to "Request to pause recovery." in according with > > > > > > pg_is_wal_replay_paused? > > > > > > > > > > Okay > > > > > > > > > > > > > > > > > > > > Also, how about adding a new boolean argument to > > > > > > > > > pg_is_wal_replay_paused to > > > > > > > > > control whether this waits for recovery to get paused or not? > > > > > > > > > By setting its > > > > > > > > > default value to true or false, users can use the old format > > > > > > > > > for calling this > > > > > > > > > and the backward compatibility can be maintained. > > > > > > > > > > > > > > > > So basically, if the wait_recovery_pause flag is false then we > > > > > > > > will > > > > > > > > immediately return true if the pause is requested? I agree > > > > > > > > that it is > > > > > > > > good to have an API to know whether the recovery pause is > > > > > > > > requested or > > > > > > > > not but I am not sure is it good idea to make this API serve > > > > > > > > both the > > > > > > > > purpose? Anyone else have any thoughts on this? > > > > > > > > > > > > > > > > > > > > I think the current pg_is_wal_replay_paused() already has another > > > > > > purpose; > > > > > > this waits recovery to actually get paused. If we want to limit > > > > > > this API's > > > > > > purpose only to return the pause state, it seems better to fix this > > > > > > to return > > > > > > the actual state at the cost of lacking the backward compatibility. > > > > > > If we want > > > > > > to know whether pause is requested, we may add a new API like > > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait > > > > > > recovery to actually > > > > > > get paused, we may add an option to pg_wal_replay_pause() for this > > > > > > purpose. > > > > > > > > > > > > However, this might be a bikeshedding. If anyone don't care that > > > > > > pg_is_wal_replay_paused() can make user wait for a long time, I > > > > > > don't care either. > > > > > > > > > > I don't think that it will be blocked ever, because > > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the > > > > > recovery process will not be stuck on waiting for the WAL. > > > > > > > > > > > > > > As another comment, while pg_is_wal_replay_paused is > > > > > > > > > blocking, I can not cancel > > > > > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the > > > > > > > > > waiting loop. > > > > > > > > > > > > How about this fix? I think users may want to cancel > > > > > > pg_is_wal_replay_paused() during > > > > > > this is blocking. > > > > > > > > > > Yeah, we can do this. I will send the updated patch after putting > > > > > some more thought into these comments. Thanks again for the feedback. > > > > > > > > > > > > > Please find the updated patch. > > > > > > I've looked at the patch. Here are review comments: > > > > > > + /* Recovery pause state */ > > > + RecoveryPauseState recoveryPause; > > > > > > Now that the value can have tri-state, how about renaming it to > > > recoveryPauseState? > > > > This makes sense to me. > > > > > --- > > > bool > > > RecoveryIsPaused(void) > > > +{ > > > + boolrecoveryPause; > > > + > > > + SpinLockAcquire(&XLogCtl->info_lck); > > > + recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED) ? > > > true : false; > > > + SpinLockRelease(&XLogCtl->info_lck); > > > + > > > + return recoveryPause; > > > +} > > >
Re: Is Recovery actually paused?
At Sun, 24 Jan 2021 14:26:08 +0530, Dilip Kumar wrote in > On Sun, Jan 24, 2021 at 12:16 PM Bharath Rupireddy > wrote: > > > > On Sun, Jan 24, 2021 at 7:17 AM Dilip Kumar wrote: > > > On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy > > > wrote: > > >> > > >> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar > > >> wrote: > > >> > Please find the patch for the same. I haven't added a test case for > > >> > this yet. I mean we can write a test case to pause the recovery and > > >> > get the status. But I am not sure that we can really write a reliable > > >> > test case for 'pause requested' and 'paused'. > > >> > > >> +1 to just show the recovery pause state in the output of > > >> pg_is_wal_replay_paused. But, should the function name > > >> "pg_is_wal_replay_paused" be something like > > >> "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists > > >> in a function, I expect a boolean output. Others may have better > > >> thoughts. > > >> > > >> IIUC the above change, ensuring the recovery is paused after it's > > >> requested lies with the user. IMHO, the main problem we are trying to > > >> solve is not met > > > > > > > > > Basically earlier their was no way for the user yo know whether the > > > recovery is actually paused or not because it was always returning true > > > after pause requested. Now, we will return whether pause requested or > > > actually paused. So > for tool designer who want to wait for recovery to > > > get paused can have a loop and wait until the recovery state reaches to > > > paused. That will give a better control. > > > > I get it and I agree to have that change. My point was whether we can > > have a new function pg_wal_replay_pause_and_wait that waits until > > recovery is actually paused ((along with pg_is_wal_replay_paused > > returning the actual state than a true/false) so that tool developers > > don't need to have the waiting code outside, if at all they care about > > it? Others may have better thoughts than me. > > I think the previous patch was based on that idea where we thought > that we can pass an argument to pg_is_wal_replay_paused which can > decide whether to wait or return without the wait. I think this > version looks better to me where we give the status instead of > waiting. I am not sure whether we want another version of > pg_wal_replay_pause which will wait for actually it to get paused. I > mean there is always a scope to include the functionality in the > database which can be achieved by the tool, but this patch was trying > to solve the problem that there was no way to know the status so I > think returning the correct status should be the scope of this. I understand that the requirement here is that no record is replayed after pg_wal_replay_pause() is returned, or pg_is_wal_replay_paused() returns true, and delays taken while recovery don't delay the state change. That requirements are really synchronous. On the other hand the machinery is designed to be asynchronous. >* Note that we intentionally don't take the info_lck spinlock >* here. We might therefore read a slightly stale value of >* the recoveryPause flag, but it can't be very stale (no >* worse than the last spinlock we did acquire). Since a >* pause request is a pretty asynchronous thing anyway, >* possibly responding to it one WAL record later than we >* otherwise would is a minor issue, so it doesn't seem worth >* adding another spinlock cycle to prevent that. As the result, this patch tries to introduce several new checkpoints to some delaying point so that that waits can find pause request in a timely manner. I think we had better use locking (or atomics) for the information instead of such scattered checkpoints if we expect that machinery to work in a such syhcnronous manner. That would make the tri-state state variable and many checkpoints unnecessary. Maybe. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is Recovery actually paused?
On Sun, Jan 17, 2021 at 5:19 PM Dilip Kumar wrote: > > On Sat, Jan 16, 2021 at 8:59 AM Masahiko Sawada wrote: > > > > On Wed, Jan 13, 2021 at 9:20 PM Dilip Kumar wrote: > > > > > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar wrote: > > > > > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA wrote: > > > > > > > > > > On Thu, 10 Dec 2020 11:25:23 +0530 > > > > > Dilip Kumar wrote: > > > > > > > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to > > > > > > > > wait. > > > > > > > > Especially, if max_standby_streaming_delay is -1, this will be > > > > > > > > blocked forever, > > > > > > > > although this setting may not be usual. In addition, some users > > > > > > > > may set > > > > > > > > recovery_min_apply_delay for a large. If such users call > > > > > > > > pg_is_wal_replay_paused, > > > > > > > > it could wait for a long time. > > > > > > > > > > > > > > > > At least, I think we need some descriptions on document to > > > > > > > > explain > > > > > > > > pg_is_wal_replay_paused could wait while a time. > > > > > > > > > > > > > > Ok > > > > > > > > > > > > Fixed this, added some comments in .sgml as well as in function > > > > > > header > > > > > > > > > > Thank you for fixing this. > > > > > > > > > > Also, is it better to fix the description of pg_wal_replay_pause from > > > > > "Pauses recovery." to "Request to pause recovery." in according with > > > > > pg_is_wal_replay_paused? > > > > > > > > Okay > > > > > > > > > > > > > > > > > Also, how about adding a new boolean argument to > > > > > > > > pg_is_wal_replay_paused to > > > > > > > > control whether this waits for recovery to get paused or not? > > > > > > > > By setting its > > > > > > > > default value to true or false, users can use the old format > > > > > > > > for calling this > > > > > > > > and the backward compatibility can be maintained. > > > > > > > > > > > > > > So basically, if the wait_recovery_pause flag is false then we > > > > > > > will > > > > > > > immediately return true if the pause is requested? I agree that > > > > > > > it is > > > > > > > good to have an API to know whether the recovery pause is > > > > > > > requested or > > > > > > > not but I am not sure is it good idea to make this API serve both > > > > > > > the > > > > > > > purpose? Anyone else have any thoughts on this? > > > > > > > > > > > > > > > > > I think the current pg_is_wal_replay_paused() already has another > > > > > purpose; > > > > > this waits recovery to actually get paused. If we want to limit this > > > > > API's > > > > > purpose only to return the pause state, it seems better to fix this > > > > > to return > > > > > the actual state at the cost of lacking the backward compatibility. > > > > > If we want > > > > > to know whether pause is requested, we may add a new API like > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait > > > > > recovery to actually > > > > > get paused, we may add an option to pg_wal_replay_pause() for this > > > > > purpose. > > > > > > > > > > However, this might be a bikeshedding. If anyone don't care that > > > > > pg_is_wal_replay_paused() can make user wait for a long time, I don't > > > > > care either. > > > > > > > > I don't think that it will be blocked ever, because > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the > > > > recovery process will not be stuck on waiting for the WAL. > > > > > > > > > > > > As another comment, while pg_is_wal_replay_paused is blocking, > > > > > > > > I can not cancel > > > > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the > > > > > > > > waiting loop. > > > > > > > > > > How about this fix? I think users may want to cancel > > > > > pg_is_wal_replay_paused() during > > > > > this is blocking. > > > > > > > > Yeah, we can do this. I will send the updated patch after putting > > > > some more thought into these comments. Thanks again for the feedback. > > > > > > > > > > Please find the updated patch. > > > > I've looked at the patch. Here are review comments: > > > > + /* Recovery pause state */ > > + RecoveryPauseState recoveryPause; > > > > Now that the value can have tri-state, how about renaming it to > > recoveryPauseState? > > This makes sense to me. > > > --- > > bool > > RecoveryIsPaused(void) > > +{ > > + boolrecoveryPause; > > + > > + SpinLockAcquire(&XLogCtl->info_lck); > > + recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED) ? > > true : false; > > + SpinLockRelease(&XLogCtl->info_lck); > > + > > + return recoveryPause; > > +} > > + > > +bool > > +RecoveryPauseRequested(void) > > { > > boolrecoveryPause; > > > > SpinLockAcquire(&XLogCtl->info_lck); > > - recoveryPause = XLogCtl->recoveryPause; > > + recoveryPause = (XLogCtl->recoveryPause != > > RECOVERY_IN_PROGRESS) ? true : false; > > SpinLockRelease(&XLogCtl->i
Re: Is Recovery actually paused?
On Sun, Jan 24, 2021 at 12:16 PM Bharath Rupireddy wrote: > > On Sun, Jan 24, 2021 at 7:17 AM Dilip Kumar wrote: > > On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy > > wrote: > >> > >> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar wrote: > >> > Please find the patch for the same. I haven't added a test case for > >> > this yet. I mean we can write a test case to pause the recovery and > >> > get the status. But I am not sure that we can really write a reliable > >> > test case for 'pause requested' and 'paused'. > >> > >> +1 to just show the recovery pause state in the output of > >> pg_is_wal_replay_paused. But, should the function name > >> "pg_is_wal_replay_paused" be something like > >> "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists > >> in a function, I expect a boolean output. Others may have better > >> thoughts. > >> > >> IIUC the above change, ensuring the recovery is paused after it's > >> requested lies with the user. IMHO, the main problem we are trying to > >> solve is not met > > > > > > Basically earlier their was no way for the user yo know whether the > > recovery is actually paused or not because it was always returning true > > after pause requested. Now, we will return whether pause requested or > > actually paused. So > for tool designer who want to wait for recovery to > > get paused can have a loop and wait until the recovery state reaches to > > paused. That will give a better control. > > I get it and I agree to have that change. My point was whether we can > have a new function pg_wal_replay_pause_and_wait that waits until > recovery is actually paused ((along with pg_is_wal_replay_paused > returning the actual state than a true/false) so that tool developers > don't need to have the waiting code outside, if at all they care about > it? Others may have better thoughts than me. I think the previous patch was based on that idea where we thought that we can pass an argument to pg_is_wal_replay_paused which can decide whether to wait or return without the wait. I think this version looks better to me where we give the status instead of waiting. I am not sure whether we want another version of pg_wal_replay_pause which will wait for actually it to get paused. I mean there is always a scope to include the functionality in the database which can be achieved by the tool, but this patch was trying to solve the problem that there was no way to know the status so I think returning the correct status should be the scope of this. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Sun, Jan 24, 2021 at 11:29 AM Dilip Kumar wrote: > > Some comments on the v6 patch: > > [2] Typo - it's "requested" + * 'paused requested' - if pause is > > reqested but recovery is not yet paused Here I meant the typo "reqested" in "if pause is reqested but recovery is not yet paused" statement from v6 patch. > > [3] I think it's "+ * 'pause requested'" instead of "+ * 'paused requested'" > > Which code does it refer, can give put the snippet from the patch. > However, I have found there were 'paused requested' in two places so I > have fixed. Thanks. > > [6] As I mentioned upthread, isn't it better to have > > "IsRecoveryPaused(void)" than "RecoveryIsPaused(void)"? > > That is an existing function so I think it's fine to keep the same name. Personally, I think the function RecoveryIsPaused itself is unnecessary with the new function GetRecoveryPauseState introduced in your patch. IMHO, we can remove it. If not okay, then we are at it, can we at least change the function name to be meaningful "IsRecoveryPaused"? Others may have better thoughts than me. > > [7] Can we have the function variable name "recoveryPause" as "state" > > or "pauseState? Because that variable context is set by the enum name > > RecoveryPauseState and the function name. > > > > +SetRecoveryPause(RecoveryPauseState recoveryPause) > > > > Here as well, "recoveryPauseState" to "state"? > > +GetRecoveryPauseState(void) > > { > > -boolrecoveryPause; > > +RecoveryPauseStaterecoveryPauseState; > > I don't think it is required but while changing the patch I will see > whether to change or not. It will be good to change that. I personally don't like structure names and variable names to be the same. > > [6] Function name RecoveryIsPaused and it's comment "Check whether the > > recovery pause is requested." doesn't seem to be matching. Seems like > > it returns true even when RECOVERY_PAUSE_REQUESTED or RECOVERY_PAUSED. > > Should it return true only when the state is RECOVERY_PAUSE_REQUESTED? > > Code is doing right, I will change the comments. > > > Instead of "while (RecoveryIsPaused())", can't we change it to "while > > (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)" and remove the > > RecoveryIsPaused()? > > I think it looks clean with the function As I said earlier, I see no use of RecoveryIsPaused() with the introduction of the new function GetRecoveryPauseState(). Others may have better thoughts than me. > > [7] Can we change the switch-case in pg_is_wal_replay_paused to > > something like below? > > > > Datum > > pg_is_wal_replay_paused(PG_FUNCTION_ARGS) > > { > > +char *state; > > +/* get the recovery pause state */ > > +switch(GetRecoveryPauseState()) > > +{ > > +case RECOVERY_NOT_PAUSED: > > +state = "not paused"; > > +case RECOVERY_PAUSE_REQUESTED: > > +state = "paused requested"; > > +case RECOVERY_PAUSED: > > +state = "paused"; > > +default: > > +elog(ERROR, "invalid recovery pause state"); > > +} > > + > > +PG_RETURN_TEXT_P(cstring_to_text(type)); > > Why do you think it is better to use an extra variable? I see no wrong in having PG_RETURN_TEXT_P and cstring_to_text 's in every case statement. But, just to make sure the code looks cleaner, I said that we can have a local state variable and just one PG_RETURN_TEXT_P(cstring_to_text(state));. See some existing functions brin_page_type, hash_page_type, json_typeof, pg_stat_get_backend_activity, pg_stat_get_backend_wait_event_type, pg_stat_get_backend_wait_event, get_command_type. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Sun, Jan 24, 2021 at 7:17 AM Dilip Kumar wrote: > On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy > wrote: >> >> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar wrote: >> > Please find the patch for the same. I haven't added a test case for >> > this yet. I mean we can write a test case to pause the recovery and >> > get the status. But I am not sure that we can really write a reliable >> > test case for 'pause requested' and 'paused'. >> >> +1 to just show the recovery pause state in the output of >> pg_is_wal_replay_paused. But, should the function name >> "pg_is_wal_replay_paused" be something like >> "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists >> in a function, I expect a boolean output. Others may have better >> thoughts. >> >> IIUC the above change, ensuring the recovery is paused after it's >> requested lies with the user. IMHO, the main problem we are trying to >> solve is not met > > > Basically earlier their was no way for the user yo know whether the recovery > is actually paused or not because it was always returning true after pause > requested. Now, we will return whether pause requested or actually paused. > So > for tool designer who want to wait for recovery to get paused can have a > loop and wait until the recovery state reaches to paused. That will give a > better control. I get it and I agree to have that change. My point was whether we can have a new function pg_wal_replay_pause_and_wait that waits until recovery is actually paused ((along with pg_is_wal_replay_paused returning the actual state than a true/false) so that tool developers don't need to have the waiting code outside, if at all they care about it? Others may have better thoughts than me. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Sat, Jan 23, 2021 at 4:40 PM Bharath Rupireddy wrote: > > On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar wrote: > > Please find the patch for the same. I haven't added a test case for > > this yet. I mean we can write a test case to pause the recovery and > > get the status. But I am not sure that we can really write a reliable > > test case for 'pause requested' and 'paused'. > > +1 to just show the recovery pause state in the output of > pg_is_wal_replay_paused. But, should the function name > "pg_is_wal_replay_paused" be something like > "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists > in a function, I expect a boolean output. Others may have better > thoughts. I am fine with the name change, but don't feel that it will be completely wrong if pg_is_wal_replay_paused returns a different state of the recovery pause. So I would like to see what others thinks and based on that we can decide. > IIUC the above change, ensuring the recovery is paused after it's > requested lies with the user. IMHO, the main problem we are trying to > solve is not met. Isn't it better if we have a new function(wait > version) along with the above change to pg_is_wal_replay_paused, > something like "pg_wal_replay_pause_and_wait" returning true or false? > The functionality is pg_wal_replay_pause + wait until it's actually > paused. > > Thoughts? Already replied in the last mail. > Some comments on the v6 patch: > > [1] How about > + * This function returns the current state of the recovery pause. > instead of > + * This api will return the current state of the recovery pause. Okay > [2] Typo - it's "requested" + * 'paused requested' - if pause is > reqested but recovery is not yet paused > > [3] I think it's "+ * 'pause requested'" instead of "+ * 'paused requested'" Which code does it refer, can give put the snippet from the patch. However, I have found there were 'paused requested' in two places so I have fixed. > [4] Isn't it good to have an example usage and output of the function > in the documentaion? > +Returns recovery pause status, which is not > paused if > +pause is not requested, pause requested if pause > is > +requested but recovery is not yet paused and, > paused if > +the recovery is actually paused. > I will add. > [5] Is it > + * Wait until shared recoveryPause state is set to RECOVERY_NOT_PAUSED. > instead of > + * Wait until shared recoveryPause is set to RECOVERY_NOT_PAUSED. Ok > [6] As I mentioned upthread, isn't it better to have > "IsRecoveryPaused(void)" than "RecoveryIsPaused(void)"? That is an existing function so I think it's fine to keep the same name. > [7] Can we have the function variable name "recoveryPause" as "state" > or "pauseState? Because that variable context is set by the enum name > RecoveryPauseState and the function name. > > +SetRecoveryPause(RecoveryPauseState recoveryPause) > > Here as well, "recoveryPauseState" to "state"? > +GetRecoveryPauseState(void) > { > -boolrecoveryPause; > +RecoveryPauseStaterecoveryPauseState; I don't think it is required but while changing the patch I will see whether to change or not. > [6] Function name RecoveryIsPaused and it's comment "Check whether the > recovery pause is requested." doesn't seem to be matching. Seems like > it returns true even when RECOVERY_PAUSE_REQUESTED or RECOVERY_PAUSED. > Should it return true only when the state is RECOVERY_PAUSE_REQUESTED? Code is doing right, I will change the comments. > Instead of "while (RecoveryIsPaused())", can't we change it to "while > (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)" and remove the > RecoveryIsPaused()? I think it looks clean with the function > [7] Can we change the switch-case in pg_is_wal_replay_paused to > something like below? > > Datum > pg_is_wal_replay_paused(PG_FUNCTION_ARGS) > { > +char *state; > +/* get the recovery pause state */ > +switch(GetRecoveryPauseState()) > +{ > +case RECOVERY_NOT_PAUSED: > +state = "not paused"; > +case RECOVERY_PAUSE_REQUESTED: > +state = "paused requested"; > +case RECOVERY_PAUSED: > +state = "paused"; > +default: > +elog(ERROR, "invalid recovery pause state"); > +} > + > +PG_RETURN_TEXT_P(cstring_to_text(type)); Why do you think it is better to use an extra variable? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar wrote: > > Please find the patch for the same. I haven't added a test case for > > this yet. I mean we can write a test case to pause the recovery and > > get the status. But I am not sure that we can really write a reliable > > test case for 'pause requested' and 'paused'. > > +1 to just show the recovery pause state in the output of > pg_is_wal_replay_paused. But, should the function name > "pg_is_wal_replay_paused" be something like > "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists > in a function, I expect a boolean output. Others may have better > thoughts. > > IIUC the above change, ensuring the recovery is paused after it's > requested lies with the user. IMHO, the main problem we are trying to > solve is not met Basically earlier their was no way for the user yo know whether the recovery is actually paused or not because it was always returning true after pause requested. Now, we will return whether pause requested or actually paused. So for tool designer who want to wait for recovery to get paused can have a loop and wait until the recovery state reaches to paused. That will give a better control. I will check other comments and respond along with the patch. > -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar wrote: > Please find the patch for the same. I haven't added a test case for > this yet. I mean we can write a test case to pause the recovery and > get the status. But I am not sure that we can really write a reliable > test case for 'pause requested' and 'paused'. +1 to just show the recovery pause state in the output of pg_is_wal_replay_paused. But, should the function name "pg_is_wal_replay_paused" be something like "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists in a function, I expect a boolean output. Others may have better thoughts. IIUC the above change, ensuring the recovery is paused after it's requested lies with the user. IMHO, the main problem we are trying to solve is not met. Isn't it better if we have a new function(wait version) along with the above change to pg_is_wal_replay_paused, something like "pg_wal_replay_pause_and_wait" returning true or false? The functionality is pg_wal_replay_pause + wait until it's actually paused. Thoughts? Some comments on the v6 patch: [1] How about + * This function returns the current state of the recovery pause. instead of + * This api will return the current state of the recovery pause. [2] Typo - it's "requested" + * 'paused requested' - if pause is reqested but recovery is not yet paused [3] I think it's "+ * 'pause requested'" instead of "+ * 'paused requested'" [4] Isn't it good to have an example usage and output of the function in the documentaion? +Returns recovery pause status, which is not paused if +pause is not requested, pause requested if pause is +requested but recovery is not yet paused and, paused if +the recovery is actually paused. [5] Is it + * Wait until shared recoveryPause state is set to RECOVERY_NOT_PAUSED. instead of + * Wait until shared recoveryPause is set to RECOVERY_NOT_PAUSED. [6] As I mentioned upthread, isn't it better to have "IsRecoveryPaused(void)" than "RecoveryIsPaused(void)"? [7] Can we have the function variable name "recoveryPause" as "state" or "pauseState? Because that variable context is set by the enum name RecoveryPauseState and the function name. +SetRecoveryPause(RecoveryPauseState recoveryPause) Here as well, "recoveryPauseState" to "state"? +GetRecoveryPauseState(void) { -boolrecoveryPause; +RecoveryPauseStaterecoveryPauseState; [6] Function name RecoveryIsPaused and it's comment "Check whether the recovery pause is requested." doesn't seem to be matching. Seems like it returns true even when RECOVERY_PAUSE_REQUESTED or RECOVERY_PAUSED. Should it return true only when the state is RECOVERY_PAUSE_REQUESTED? Instead of "while (RecoveryIsPaused())", can't we change it to "while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)" and remove the RecoveryIsPaused()? [7] Can we change the switch-case in pg_is_wal_replay_paused to something like below? Datum pg_is_wal_replay_paused(PG_FUNCTION_ARGS) { +char *state; +/* get the recovery pause state */ +switch(GetRecoveryPauseState()) +{ +case RECOVERY_NOT_PAUSED: +state = "not paused"; +case RECOVERY_PAUSE_REQUESTED: +state = "paused requested"; +case RECOVERY_PAUSED: +state = "paused"; +default: +elog(ERROR, "invalid recovery pause state"); +} + +PG_RETURN_TEXT_P(cstring_to_text(type)); With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Sat, Jan 23, 2021 at 9:56 AM Dilip Kumar wrote: > > On Fri, Jan 22, 2021 at 2:18 AM Robert Haas wrote: > > > > On Mon, Jan 18, 2021 at 9:42 PM Yugo NAGATA wrote: > > > If it is acceptable that pg_is_wal_replay_paused() makes users wait, > > > I'm ok for the current interface. I don't feel the need of > > > pg_is_wal_replay_paluse_requeseted(). > > > > Another idea could be that pg_is_wal_replay_paused() could be changed > > to text, and the string could be either 'paused' or 'pause requested' > > or 'not paused'. That way we'd be returning a direct representation of > > the state we're keeping in memory. Some of the complexity in this > > discussion seems to come from trying to squeeze 3 possibilities into a > > Boolean. > > > > Let's also consider that we don't really know whether the client wants > > us to wait or not, and different clients may want different things, or > > maybe not, but we don't really know at this point. If we provide an > > interface that waits, and the client doesn't want to wait but just > > know the current state, they don't necessarily have any great options. > > If we provide an interface that doesn't wait, and the client wants to > > wait, it can poll until it gets the answer it wants. Polling can be > > inefficient, but anybody who is writing a tool that uses this should > > be able to manage an algorithm with some reasonable back-off behavior > > (e.g. try after 10ms, 20ms, keep doubling, max of 5s, or something of > > that sort), so I'm not sure there's actually any real problem in > > practice. So to me it seems more likely that an interface that is > > based on waiting will cause difficulty for tool-writers than one that > > does not. > > > > Other people may feel differently, of course... > > I think this is the better way of handling this. So +1 from my side, > I will send an updated patch. Please find the patch for the same. I haven't added a test case for this yet. I mean we can write a test case to pause the recovery and get the status. But I am not sure that we can really write a reliable test case for 'pause requested' and 'paused'. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com v6-0001-pg_is_wal_replay_paused-will-return-the-status-of.patch Description: Binary data
Re: Is Recovery actually paused?
On Thu, Jan 21, 2021 at 6:20 PM Yugo NAGATA wrote: > > On Tue, 19 Jan 2021 21:32:31 +0530 > Dilip Kumar wrote: > > > On Tue, Jan 19, 2021 at 8:34 AM Dilip Kumar wrote: > > > > > > On Tue, 19 Jan 2021 at 8:12 AM, Yugo NAGATA wrote: > > >> > > >> On Sun, 17 Jan 2021 11:33:52 +0530 > > >> Dilip Kumar wrote: > > >> > > >> > On Thu, Jan 14, 2021 at 6:49 PM Yugo NAGATA > > >> > wrote: > > >> > > > > >> > > On Wed, 13 Jan 2021 17:49:43 +0530 > > >> > > Dilip Kumar wrote: > > >> > > > > >> > > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar > > >> > > > wrote: > > >> > > > > > > >> > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA > > >> > > > > wrote: > > >> > > > > > > > >> > > > > > On Thu, 10 Dec 2020 11:25:23 +0530 > > >> > > > > > Dilip Kumar wrote: > > >> > > > > > > > >> > > > > > > > > However, I wonder users don't expect > > >> > > > > > > > > pg_is_wal_replay_paused to wait. > > >> > > > > > > > > Especially, if max_standby_streaming_delay is -1, this > > >> > > > > > > > > will be blocked forever, > > >> > > > > > > > > although this setting may not be usual. In addition, > > >> > > > > > > > > some users may set > > >> > > > > > > > > recovery_min_apply_delay for a large. If such users > > >> > > > > > > > > call pg_is_wal_replay_paused, > > >> > > > > > > > > it could wait for a long time. > > >> > > > > > > > > > > >> > > > > > > > > At least, I think we need some descriptions on document > > >> > > > > > > > > to explain > > >> > > > > > > > > pg_is_wal_replay_paused could wait while a time. > > >> > > > > > > > > > >> > > > > > > > Ok > > >> > > > > > > > > >> > > > > > > Fixed this, added some comments in .sgml as well as in > > >> > > > > > > function header > > >> > > > > > > > >> > > > > > Thank you for fixing this. > > >> > > > > > > > >> > > > > > Also, is it better to fix the description of > > >> > > > > > pg_wal_replay_pause from > > >> > > > > > "Pauses recovery." to "Request to pause recovery." in > > >> > > > > > according with > > >> > > > > > pg_is_wal_replay_paused? > > >> > > > > > > >> > > > > Okay > > >> > > > > > > >> > > > > > > > >> > > > > > > > > Also, how about adding a new boolean argument to > > >> > > > > > > > > pg_is_wal_replay_paused to > > >> > > > > > > > > control whether this waits for recovery to get paused or > > >> > > > > > > > > not? By setting its > > >> > > > > > > > > default value to true or false, users can use the old > > >> > > > > > > > > format for calling this > > >> > > > > > > > > and the backward compatibility can be maintained. > > >> > > > > > > > > > >> > > > > > > > So basically, if the wait_recovery_pause flag is false > > >> > > > > > > > then we will > > >> > > > > > > > immediately return true if the pause is requested? I > > >> > > > > > > > agree that it is > > >> > > > > > > > good to have an API to know whether the recovery pause is > > >> > > > > > > > requested or > > >> > > > > > > > not but I am not sure is it good idea to make this API > > >> > > > > > > > serve both the > > >> > > > > > > > purpose? Anyone else have any thoughts on this? > > >> > > > > > > > > > >> > > > > > > > >> > > > > > I think the current pg_is_wal_replay_paused() already has > > >> > > > > > another purpose; > > >> > > > > > this waits recovery to actually get paused. If we want to > > >> > > > > > limit this API's > > >> > > > > > purpose only to return the pause state, it seems better to fix > > >> > > > > > this to return > > >> > > > > > the actual state at the cost of lacking the backward > > >> > > > > > compatibility. If we want > > >> > > > > > to know whether pause is requested, we may add a new API like > > >> > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait > > >> > > > > > recovery to actually > > >> > > > > > get paused, we may add an option to pg_wal_replay_pause() for > > >> > > > > > this purpose. > > >> > > > > > > > >> > > > > > However, this might be a bikeshedding. If anyone don't care > > >> > > > > > that > > >> > > > > > pg_is_wal_replay_paused() can make user wait for a long time, > > >> > > > > > I don't care either. > > >> > > > > > > >> > > > > I don't think that it will be blocked ever, because > > >> > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means > > >> > > > > the > > >> > > > > recovery process will not be stuck on waiting for the WAL. > > >> > > > > >> > > Yes, there is no stuck on waiting for the WAL. However, it can be > > >> > > stuck during resolving > > >> > > a recovery conflict. The process could wait for > > >> > > max_standby_streaming_delay or > > >> > > max_standby_archive_delay at most before recovery get completely > > >> > > paused. > > >> > > > >> > Okay, I agree that it is possible so for handling this we have a > > >> > couple of options > > >> > 1. pg_is_wal_replay_paused(), interface will wait for recovery to > > >> > actually get paused, but user have an option to cancel that. So I > > >> > agree that there is cu
Re: Is Recovery actually paused?
On Fri, Jan 22, 2021 at 2:18 AM Robert Haas wrote: > > On Mon, Jan 18, 2021 at 9:42 PM Yugo NAGATA wrote: > > If it is acceptable that pg_is_wal_replay_paused() makes users wait, > > I'm ok for the current interface. I don't feel the need of > > pg_is_wal_replay_paluse_requeseted(). > > Another idea could be that pg_is_wal_replay_paused() could be changed > to text, and the string could be either 'paused' or 'pause requested' > or 'not paused'. That way we'd be returning a direct representation of > the state we're keeping in memory. Some of the complexity in this > discussion seems to come from trying to squeeze 3 possibilities into a > Boolean. > > Let's also consider that we don't really know whether the client wants > us to wait or not, and different clients may want different things, or > maybe not, but we don't really know at this point. If we provide an > interface that waits, and the client doesn't want to wait but just > know the current state, they don't necessarily have any great options. > If we provide an interface that doesn't wait, and the client wants to > wait, it can poll until it gets the answer it wants. Polling can be > inefficient, but anybody who is writing a tool that uses this should > be able to manage an algorithm with some reasonable back-off behavior > (e.g. try after 10ms, 20ms, keep doubling, max of 5s, or something of > that sort), so I'm not sure there's actually any real problem in > practice. So to me it seems more likely that an interface that is > based on waiting will cause difficulty for tool-writers than one that > does not. > > Other people may feel differently, of course... I think this is the better way of handling this. So +1 from my side, I will send an updated patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Mon, Jan 18, 2021 at 9:42 PM Yugo NAGATA wrote: > If it is acceptable that pg_is_wal_replay_paused() makes users wait, > I'm ok for the current interface. I don't feel the need of > pg_is_wal_replay_paluse_requeseted(). Another idea could be that pg_is_wal_replay_paused() could be changed to text, and the string could be either 'paused' or 'pause requested' or 'not paused'. That way we'd be returning a direct representation of the state we're keeping in memory. Some of the complexity in this discussion seems to come from trying to squeeze 3 possibilities into a Boolean. Let's also consider that we don't really know whether the client wants us to wait or not, and different clients may want different things, or maybe not, but we don't really know at this point. If we provide an interface that waits, and the client doesn't want to wait but just know the current state, they don't necessarily have any great options. If we provide an interface that doesn't wait, and the client wants to wait, it can poll until it gets the answer it wants. Polling can be inefficient, but anybody who is writing a tool that uses this should be able to manage an algorithm with some reasonable back-off behavior (e.g. try after 10ms, 20ms, keep doubling, max of 5s, or something of that sort), so I'm not sure there's actually any real problem in practice. So to me it seems more likely that an interface that is based on waiting will cause difficulty for tool-writers than one that does not. Other people may feel differently, of course... -- Robert Haas EDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Tue, 19 Jan 2021 21:32:31 +0530 Dilip Kumar wrote: > On Tue, Jan 19, 2021 at 8:34 AM Dilip Kumar wrote: > > > > On Tue, 19 Jan 2021 at 8:12 AM, Yugo NAGATA wrote: > >> > >> On Sun, 17 Jan 2021 11:33:52 +0530 > >> Dilip Kumar wrote: > >> > >> > On Thu, Jan 14, 2021 at 6:49 PM Yugo NAGATA wrote: > >> > > > >> > > On Wed, 13 Jan 2021 17:49:43 +0530 > >> > > Dilip Kumar wrote: > >> > > > >> > > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar > >> > > > wrote: > >> > > > > > >> > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA > >> > > > > wrote: > >> > > > > > > >> > > > > > On Thu, 10 Dec 2020 11:25:23 +0530 > >> > > > > > Dilip Kumar wrote: > >> > > > > > > >> > > > > > > > > However, I wonder users don't expect > >> > > > > > > > > pg_is_wal_replay_paused to wait. > >> > > > > > > > > Especially, if max_standby_streaming_delay is -1, this > >> > > > > > > > > will be blocked forever, > >> > > > > > > > > although this setting may not be usual. In addition, some > >> > > > > > > > > users may set > >> > > > > > > > > recovery_min_apply_delay for a large. If such users call > >> > > > > > > > > pg_is_wal_replay_paused, > >> > > > > > > > > it could wait for a long time. > >> > > > > > > > > > >> > > > > > > > > At least, I think we need some descriptions on document to > >> > > > > > > > > explain > >> > > > > > > > > pg_is_wal_replay_paused could wait while a time. > >> > > > > > > > > >> > > > > > > > Ok > >> > > > > > > > >> > > > > > > Fixed this, added some comments in .sgml as well as in > >> > > > > > > function header > >> > > > > > > >> > > > > > Thank you for fixing this. > >> > > > > > > >> > > > > > Also, is it better to fix the description of pg_wal_replay_pause > >> > > > > > from > >> > > > > > "Pauses recovery." to "Request to pause recovery." in according > >> > > > > > with > >> > > > > > pg_is_wal_replay_paused? > >> > > > > > >> > > > > Okay > >> > > > > > >> > > > > > > >> > > > > > > > > Also, how about adding a new boolean argument to > >> > > > > > > > > pg_is_wal_replay_paused to > >> > > > > > > > > control whether this waits for recovery to get paused or > >> > > > > > > > > not? By setting its > >> > > > > > > > > default value to true or false, users can use the old > >> > > > > > > > > format for calling this > >> > > > > > > > > and the backward compatibility can be maintained. > >> > > > > > > > > >> > > > > > > > So basically, if the wait_recovery_pause flag is false then > >> > > > > > > > we will > >> > > > > > > > immediately return true if the pause is requested? I agree > >> > > > > > > > that it is > >> > > > > > > > good to have an API to know whether the recovery pause is > >> > > > > > > > requested or > >> > > > > > > > not but I am not sure is it good idea to make this API serve > >> > > > > > > > both the > >> > > > > > > > purpose? Anyone else have any thoughts on this? > >> > > > > > > > > >> > > > > > > >> > > > > > I think the current pg_is_wal_replay_paused() already has > >> > > > > > another purpose; > >> > > > > > this waits recovery to actually get paused. If we want to limit > >> > > > > > this API's > >> > > > > > purpose only to return the pause state, it seems better to fix > >> > > > > > this to return > >> > > > > > the actual state at the cost of lacking the backward > >> > > > > > compatibility. If we want > >> > > > > > to know whether pause is requested, we may add a new API like > >> > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait > >> > > > > > recovery to actually > >> > > > > > get paused, we may add an option to pg_wal_replay_pause() for > >> > > > > > this purpose. > >> > > > > > > >> > > > > > However, this might be a bikeshedding. If anyone don't care that > >> > > > > > pg_is_wal_replay_paused() can make user wait for a long time, I > >> > > > > > don't care either. > >> > > > > > >> > > > > I don't think that it will be blocked ever, because > >> > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the > >> > > > > recovery process will not be stuck on waiting for the WAL. > >> > > > >> > > Yes, there is no stuck on waiting for the WAL. However, it can be > >> > > stuck during resolving > >> > > a recovery conflict. The process could wait for > >> > > max_standby_streaming_delay or > >> > > max_standby_archive_delay at most before recovery get completely > >> > > paused. > >> > > >> > Okay, I agree that it is possible so for handling this we have a > >> > couple of options > >> > 1. pg_is_wal_replay_paused(), interface will wait for recovery to > >> > actually get paused, but user have an option to cancel that. So I > >> > agree that there is currently no option to just know that recovery > >> > pause is requested without waiting for its actually get paused if it > >> > is requested. So one option is we can provide an another interface as > >> > you mentioned pg_is_wal_replay_paluse_requeseted(), which can just > >> > return the request status. I a
Re: Is Recovery actually paused?
On Thu, Jan 21, 2021 at 3:29 PM Bharath Rupireddy wrote: Thanks for reviewing Bharat. > On Tue, Jan 19, 2021 at 9:32 PM Dilip Kumar wrote: > > In the last patch there were some local changes which I did not add to > > the patch and it was giving compilation warning so fixed that along > > with that I have addressed your this comment as well. > > Thanks for the patch. I took a look at the v5 patch, below are some > comments. Please ignore if I'm repeating any of the comments discussed > upthread. > > [1] Can we also have a wait version for pg_wal_replay_pause that waits > until recovery is actually paused right after setting the recovery > state to RECOVERY_PAUSE_REQUESTED? Something like this - > pg_wal_replay_pause_and_wait(wait boolean, wait_seconds integer > DEFAULT 60) returns boolean. It waits until either default or provided > wait_seconds and returns true if the recovery is paused within that > wait_seconds otherwise false. If wait_seconds is 0 or -1, then it > waits until recovery is paused and returns true. One advantage of this > function is that users don't need to call pg_is_wal_replay_paused(). > IMHO, the job of ensuring whether or not the recovery is actually > paused, is better done by the one who requests > it(pg_wal_replay_pause/pg_wal_replay_pause_and_wait). I don't think we need wait/onwait version for all the APIs, IMHO it would be enough for the user to know whether the recovery is actually paused or not and for that, we are changing pg_is_wal_replay_paused to wait for the pause. However, in the next version in pg_is_wal_replay_paused I will provide a flag so that the user can decide whether to wait for the pause or just get the request status. > [2] Is it intentional that RecoveryPauseRequested() returns true even > if XLogCtl->recoveryPauseState is either RECOVERY_PAUSE_REQUESTED or > RECOVERY_PAUSED? Yes this is intended > [3] Can we change IsRecoveryPaused() instead of RecoveryIsPaused() and > IsRecoveryPauseRequested() instead of RecoveryPauseRequested()? How > about having inline(because they have one line of code) functions like > IsRecoveryPauseRequested(), IsRecoveryPaused() and > IsRecoveryInProgress(), returning true when RECOVERY_PAUSE_REQUESTED, > RECOVERY_PAUSED and RECOVERY_IN_PROGRESS respectively? Yeah, we can do that, I am not sure whether we need IsRecoveryInProgress function though. > [4] Can we have at least one line of comments for each of the new > functions, I know the function names mean everything? And also for > existing SetRecoveryPause() and GetRecoveryPauseState()? Will do that > [5] Typo, it's "every time" ---> + * this everytime. Ok > [6] Do we reach PG_RETURN_BOOL(true); at the end of > pg_is_wal_replay_paused()? If not, is it there for satisfying the > compiler? Yes > [7] In pg_is_wal_replay_paused(), do we need if > (!RecoveryPauseRequested()) inside the for (;;)? If yes, can we add > comments about why we need it there? Yes, we need it if replay resumed during the loop, I will add the comments. > [8] Isn't it good to have > pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE) and > pgstat_report_wait_end(), just before for (;;) and at the end > respectively? This will help users in knowing what they are waiting > for? Alternatively we can issue notice/warnings/write to server log > while we are waiting in for (;;) for the recovery to get paused? I think we can do that, let me think about this and get back to you. > [9] In pg_is_wal_replay_paused(), is 10 msec sleep time chosen > randomly or based on some analysis that the time it takes to get to > recovery paused state from request state or some other? I don't think we can identify that when actually recovery can get paused. Though in pg_wal_replay_pause() we are sending a signal to all the places we are waiting for WAL and right after we are checking. But it depends upon other configuration parameters like max_standby_streaming_delay. > [10] errhint("The standby was promoted while waiting for recovery to > be paused."))); Can we know whether standby is actually promoted and > throw this error? Because the error "recovery is not in progress" here > is being thrown by just relying on if (!RecoveryInProgress()). IIUC, > using pg_promote Because before checking this it has already checked RecoveryPauseRequested() and if that is true then the RecoveryInProgress was in progress at some time and now it is not anymore and that can happen due to promotion. But I am fine with reverting to the old error that can not execute if recovery is not in progress. > [11] Can we try to add tests for these functions in TAP? Currently, we > don't have any tests for pg_is_wal_replay_paused, pg_wal_replay_resume > or pg_wal_replay_pause, but we have tests for pg_promote in timeline > switch. I will work on this. > [12] Isn't it better to change RecoveryPauseState enum > RECOVERY_IN_PROGRESS value to start from 1 instead of 0? Because when > XLogCtl shared memory is initialized, I thin
Re: Is Recovery actually paused?
On Tue, Jan 19, 2021 at 9:32 PM Dilip Kumar wrote: > In the last patch there were some local changes which I did not add to > the patch and it was giving compilation warning so fixed that along > with that I have addressed your this comment as well. Thanks for the patch. I took a look at the v5 patch, below are some comments. Please ignore if I'm repeating any of the comments discussed upthread. [1] Can we also have a wait version for pg_wal_replay_pause that waits until recovery is actually paused right after setting the recovery state to RECOVERY_PAUSE_REQUESTED? Something like this - pg_wal_replay_pause_and_wait(wait boolean, wait_seconds integer DEFAULT 60) returns boolean. It waits until either default or provided wait_seconds and returns true if the recovery is paused within that wait_seconds otherwise false. If wait_seconds is 0 or -1, then it waits until recovery is paused and returns true. One advantage of this function is that users don't need to call pg_is_wal_replay_paused(). IMHO, the job of ensuring whether or not the recovery is actually paused, is better done by the one who requests it(pg_wal_replay_pause/pg_wal_replay_pause_and_wait). [2] Is it intentional that RecoveryPauseRequested() returns true even if XLogCtl->recoveryPauseState is either RECOVERY_PAUSE_REQUESTED or RECOVERY_PAUSED? [3] Can we change IsRecoveryPaused() instead of RecoveryIsPaused() and IsRecoveryPauseRequested() instead of RecoveryPauseRequested()? How about having inline(because they have one line of code) functions like IsRecoveryPauseRequested(), IsRecoveryPaused() and IsRecoveryInProgress(), returning true when RECOVERY_PAUSE_REQUESTED, RECOVERY_PAUSED and RECOVERY_IN_PROGRESS respectively? [4] Can we have at least one line of comments for each of the new functions, I know the function names mean everything? And also for existing SetRecoveryPause() and GetRecoveryPauseState()? [5] Typo, it's "every time" ---> + * this everytime. [6] Do we reach PG_RETURN_BOOL(true); at the end of pg_is_wal_replay_paused()? If not, is it there for satisfying the compiler? [7] In pg_is_wal_replay_paused(), do we need if (!RecoveryPauseRequested()) inside the for (;;)? If yes, can we add comments about why we need it there? [8] Isn't it good to have pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE) and pgstat_report_wait_end(), just before for (;;) and at the end respectively? This will help users in knowing what they are waiting for? Alternatively we can issue notice/warnings/write to server log while we are waiting in for (;;) for the recovery to get paused? [9] In pg_is_wal_replay_paused(), is 10 msec sleep time chosen randomly or based on some analysis that the time it takes to get to recovery paused state from request state or some other? [10] errhint("The standby was promoted while waiting for recovery to be paused."))); Can we know whether standby is actually promoted and throw this error? Because the error "recovery is not in progress" here is being thrown by just relying on if (!RecoveryInProgress()). IIUC, using pg_promote [11] Can we try to add tests for these functions in TAP? Currently, we don't have any tests for pg_is_wal_replay_paused, pg_wal_replay_resume or pg_wal_replay_pause, but we have tests for pg_promote in timeline switch. [12] Isn't it better to change RecoveryPauseState enum RECOVERY_IN_PROGRESS value to start from 1 instead of 0? Because when XLogCtl shared memory is initialized, I think recoveryPauseState can be 0, so should it mean recovery in progress? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Tue, Jan 19, 2021 at 8:34 AM Dilip Kumar wrote: > > On Tue, 19 Jan 2021 at 8:12 AM, Yugo NAGATA wrote: >> >> On Sun, 17 Jan 2021 11:33:52 +0530 >> Dilip Kumar wrote: >> >> > On Thu, Jan 14, 2021 at 6:49 PM Yugo NAGATA wrote: >> > > >> > > On Wed, 13 Jan 2021 17:49:43 +0530 >> > > Dilip Kumar wrote: >> > > >> > > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar >> > > > wrote: >> > > > > >> > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA >> > > > > wrote: >> > > > > > >> > > > > > On Thu, 10 Dec 2020 11:25:23 +0530 >> > > > > > Dilip Kumar wrote: >> > > > > > >> > > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused >> > > > > > > > > to wait. >> > > > > > > > > Especially, if max_standby_streaming_delay is -1, this will >> > > > > > > > > be blocked forever, >> > > > > > > > > although this setting may not be usual. In addition, some >> > > > > > > > > users may set >> > > > > > > > > recovery_min_apply_delay for a large. If such users call >> > > > > > > > > pg_is_wal_replay_paused, >> > > > > > > > > it could wait for a long time. >> > > > > > > > > >> > > > > > > > > At least, I think we need some descriptions on document to >> > > > > > > > > explain >> > > > > > > > > pg_is_wal_replay_paused could wait while a time. >> > > > > > > > >> > > > > > > > Ok >> > > > > > > >> > > > > > > Fixed this, added some comments in .sgml as well as in function >> > > > > > > header >> > > > > > >> > > > > > Thank you for fixing this. >> > > > > > >> > > > > > Also, is it better to fix the description of pg_wal_replay_pause >> > > > > > from >> > > > > > "Pauses recovery." to "Request to pause recovery." in according >> > > > > > with >> > > > > > pg_is_wal_replay_paused? >> > > > > >> > > > > Okay >> > > > > >> > > > > > >> > > > > > > > > Also, how about adding a new boolean argument to >> > > > > > > > > pg_is_wal_replay_paused to >> > > > > > > > > control whether this waits for recovery to get paused or >> > > > > > > > > not? By setting its >> > > > > > > > > default value to true or false, users can use the old format >> > > > > > > > > for calling this >> > > > > > > > > and the backward compatibility can be maintained. >> > > > > > > > >> > > > > > > > So basically, if the wait_recovery_pause flag is false then we >> > > > > > > > will >> > > > > > > > immediately return true if the pause is requested? I agree >> > > > > > > > that it is >> > > > > > > > good to have an API to know whether the recovery pause is >> > > > > > > > requested or >> > > > > > > > not but I am not sure is it good idea to make this API serve >> > > > > > > > both the >> > > > > > > > purpose? Anyone else have any thoughts on this? >> > > > > > > > >> > > > > > >> > > > > > I think the current pg_is_wal_replay_paused() already has another >> > > > > > purpose; >> > > > > > this waits recovery to actually get paused. If we want to limit >> > > > > > this API's >> > > > > > purpose only to return the pause state, it seems better to fix >> > > > > > this to return >> > > > > > the actual state at the cost of lacking the backward >> > > > > > compatibility. If we want >> > > > > > to know whether pause is requested, we may add a new API like >> > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait >> > > > > > recovery to actually >> > > > > > get paused, we may add an option to pg_wal_replay_pause() for this >> > > > > > purpose. >> > > > > > >> > > > > > However, this might be a bikeshedding. If anyone don't care that >> > > > > > pg_is_wal_replay_paused() can make user wait for a long time, I >> > > > > > don't care either. >> > > > > >> > > > > I don't think that it will be blocked ever, because >> > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the >> > > > > recovery process will not be stuck on waiting for the WAL. >> > > >> > > Yes, there is no stuck on waiting for the WAL. However, it can be stuck >> > > during resolving >> > > a recovery conflict. The process could wait for >> > > max_standby_streaming_delay or >> > > max_standby_archive_delay at most before recovery get completely paused. >> > >> > Okay, I agree that it is possible so for handling this we have a >> > couple of options >> > 1. pg_is_wal_replay_paused(), interface will wait for recovery to >> > actually get paused, but user have an option to cancel that. So I >> > agree that there is currently no option to just know that recovery >> > pause is requested without waiting for its actually get paused if it >> > is requested. So one option is we can provide an another interface as >> > you mentioned pg_is_wal_replay_paluse_requeseted(), which can just >> > return the request status. I am not sure how useful it is. >> >> If it is acceptable that pg_is_wal_replay_paused() makes users wait, >> I'm ok for the current interface. I don't feel the need of >> pg_is_wal_replay_paluse_requeseted(). >> >> > >> > 2. Pass an option to pg_is_wal_replay_paused whether to wait
Re: Is Recovery actually paused?
On Tue, Jan 19, 2021 at 10:30 AM Kyotaro Horiguchi wrote: > > At Tue, 19 Jan 2021 11:41:18 +0900, Yugo NAGATA wrote in > > On Sun, 17 Jan 2021 11:33:52 +0530 > > Dilip Kumar wrote: > > > > > > > > > > > > > > I think the current pg_is_wal_replay_paused() already has another > > > > > > > purpose; > > > > > > > this waits recovery to actually get paused. If we want to limit > > > > > > > this API's > > > > > > > purpose only to return the pause state, it seems better to fix > > > > > > > this to return > > > > > > > the actual state at the cost of lacking the backward > > > > > > > compatibility. If we want > > > > > > > to know whether pause is requested, we may add a new API like > > > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait > > > > > > > recovery to actually > > > > > > > get paused, we may add an option to pg_wal_replay_pause() for > > > > > > > this purpose. > > > > > > > > > > > > > > However, this might be a bikeshedding. If anyone don't care that > > > > > > > pg_is_wal_replay_paused() can make user wait for a long time, I > > > > > > > don't care either. > > > > > > > > > > > > I don't think that it will be blocked ever, because > > > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the > > > > > > recovery process will not be stuck on waiting for the WAL. > > > > > > > > Yes, there is no stuck on waiting for the WAL. However, it can be stuck > > > > during resolving > > > > a recovery conflict. The process could wait for > > > > max_standby_streaming_delay or > > > > max_standby_archive_delay at most before recovery get completely paused. > > > > > > Okay, I agree that it is possible so for handling this we have a > > > couple of options > > > 1. pg_is_wal_replay_paused(), interface will wait for recovery to > > > actually get paused, but user have an option to cancel that. So I > > > agree that there is currently no option to just know that recovery > > > pause is requested without waiting for its actually get paused if it > > > is requested. So one option is we can provide an another interface as > > > you mentioned pg_is_wal_replay_paluse_requeseted(), which can just > > > return the request status. I am not sure how useful it is. > > > > If it is acceptable that pg_is_wal_replay_paused() makes users wait, > > I'm ok for the current interface. I don't feel the need of > > pg_is_wal_replay_paluse_requeseted(). > > FWIW, the name "pg_is_wal_replay_paused" is suggesting "to know > whether recovery is paused or not at present" and it would be > surprising to see it to wait for the recovery actually paused by > default. > > I think there's no functions to wait for some situation at least for > now. If we wanted to wait for some condition to make, we would loop > over check-and-wait using plpgsql. > > If you desire to wait to replication to pause by a function, I would > do that by adding a parameter to the function. > > pg_is_wal_replay_paused(OPTIONAL bool wait_for_pause) This seems to be a fair point to me. So I will add an option to the API, and if that is passed true then we will wait for recovery to get paused. otherwise, this will just return true if the pause is requested same as the current behavior. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
At Tue, 19 Jan 2021 11:41:18 +0900, Yugo NAGATA wrote in > On Sun, 17 Jan 2021 11:33:52 +0530 > Dilip Kumar wrote: > > > > > > > > > > > > I think the current pg_is_wal_replay_paused() already has another > > > > > > purpose; > > > > > > this waits recovery to actually get paused. If we want to limit > > > > > > this API's > > > > > > purpose only to return the pause state, it seems better to fix this > > > > > > to return > > > > > > the actual state at the cost of lacking the backward compatibility. > > > > > > If we want > > > > > > to know whether pause is requested, we may add a new API like > > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait > > > > > > recovery to actually > > > > > > get paused, we may add an option to pg_wal_replay_pause() for this > > > > > > purpose. > > > > > > > > > > > > However, this might be a bikeshedding. If anyone don't care that > > > > > > pg_is_wal_replay_paused() can make user wait for a long time, I > > > > > > don't care either. > > > > > > > > > > I don't think that it will be blocked ever, because > > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the > > > > > recovery process will not be stuck on waiting for the WAL. > > > > > > Yes, there is no stuck on waiting for the WAL. However, it can be stuck > > > during resolving > > > a recovery conflict. The process could wait for > > > max_standby_streaming_delay or > > > max_standby_archive_delay at most before recovery get completely paused. > > > > Okay, I agree that it is possible so for handling this we have a > > couple of options > > 1. pg_is_wal_replay_paused(), interface will wait for recovery to > > actually get paused, but user have an option to cancel that. So I > > agree that there is currently no option to just know that recovery > > pause is requested without waiting for its actually get paused if it > > is requested. So one option is we can provide an another interface as > > you mentioned pg_is_wal_replay_paluse_requeseted(), which can just > > return the request status. I am not sure how useful it is. > > If it is acceptable that pg_is_wal_replay_paused() makes users wait, > I'm ok for the current interface. I don't feel the need of > pg_is_wal_replay_paluse_requeseted(). FWIW, the name "pg_is_wal_replay_paused" is suggesting "to know whether recovery is paused or not at present" and it would be surprising to see it to wait for the recovery actually paused by default. I think there's no functions to wait for some situation at least for now. If we wanted to wait for some condition to make, we would loop over check-and-wait using plpgsql. If you desire to wait to replication to pause by a function, I would do that by adding a parameter to the function. pg_is_wal_replay_paused(OPTIONAL bool wait_for_pause) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Is Recovery actually paused?
On Tue, 19 Jan 2021 at 8:12 AM, Yugo NAGATA wrote: > On Sun, 17 Jan 2021 11:33:52 +0530 > Dilip Kumar wrote: > > > On Thu, Jan 14, 2021 at 6:49 PM Yugo NAGATA wrote: > > > > > > On Wed, 13 Jan 2021 17:49:43 +0530 > > > Dilip Kumar wrote: > > > > > > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar > wrote: > > > > > > > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA > wrote: > > > > > > > > > > > > On Thu, 10 Dec 2020 11:25:23 +0530 > > > > > > Dilip Kumar wrote: > > > > > > > > > > > > > > > However, I wonder users don't expect > pg_is_wal_replay_paused to wait. > > > > > > > > > Especially, if max_standby_streaming_delay is -1, this > will be blocked forever, > > > > > > > > > although this setting may not be usual. In addition, some > users may set > > > > > > > > > recovery_min_apply_delay for a large. If such users call > pg_is_wal_replay_paused, > > > > > > > > > it could wait for a long time. > > > > > > > > > > > > > > > > > > At least, I think we need some descriptions on document to > explain > > > > > > > > > pg_is_wal_replay_paused could wait while a time. > > > > > > > > > > > > > > > > Ok > > > > > > > > > > > > > > Fixed this, added some comments in .sgml as well as in > function header > > > > > > > > > > > > Thank you for fixing this. > > > > > > > > > > > > Also, is it better to fix the description of pg_wal_replay_pause > from > > > > > > "Pauses recovery." to "Request to pause recovery." in according > with > > > > > > pg_is_wal_replay_paused? > > > > > > > > > > Okay > > > > > > > > > > > > > > > > > > > > Also, how about adding a new boolean argument to > pg_is_wal_replay_paused to > > > > > > > > > control whether this waits for recovery to get paused or > not? By setting its > > > > > > > > > default value to true or false, users can use the old > format for calling this > > > > > > > > > and the backward compatibility can be maintained. > > > > > > > > > > > > > > > > So basically, if the wait_recovery_pause flag is false then > we will > > > > > > > > immediately return true if the pause is requested? I agree > that it is > > > > > > > > good to have an API to know whether the recovery pause is > requested or > > > > > > > > not but I am not sure is it good idea to make this API serve > both the > > > > > > > > purpose? Anyone else have any thoughts on this? > > > > > > > > > > > > > > > > > > > > I think the current pg_is_wal_replay_paused() already has > another purpose; > > > > > > this waits recovery to actually get paused. If we want to limit > this API's > > > > > > purpose only to return the pause state, it seems better to fix > this to return > > > > > > the actual state at the cost of lacking the backward > compatibility. If we want > > > > > > to know whether pause is requested, we may add a new API like > > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait > recovery to actually > > > > > > get paused, we may add an option to pg_wal_replay_pause() for > this purpose. > > > > > > > > > > > > However, this might be a bikeshedding. If anyone don't care that > > > > > > pg_is_wal_replay_paused() can make user wait for a long time, I > don't care either. > > > > > > > > > > I don't think that it will be blocked ever, because > > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the > > > > > recovery process will not be stuck on waiting for the WAL. > > > > > > Yes, there is no stuck on waiting for the WAL. However, it can be > stuck during resolving > > > a recovery conflict. The process could wait for > max_standby_streaming_delay or > > > max_standby_archive_delay at most before recovery get completely > paused. > > > > Okay, I agree that it is possible so for handling this we have a > > couple of options > > 1. pg_is_wal_replay_paused(), interface will wait for recovery to > > actually get paused, but user have an option to cancel that. So I > > agree that there is currently no option to just know that recovery > > pause is requested without waiting for its actually get paused if it > > is requested. So one option is we can provide an another interface as > > you mentioned pg_is_wal_replay_paluse_requeseted(), which can just > > return the request status. I am not sure how useful it is. > > If it is acceptable that pg_is_wal_replay_paused() makes users wait, > I'm ok for the current interface. I don't feel the need of > pg_is_wal_replay_paluse_requeseted(). > > > > > 2. Pass an option to pg_is_wal_replay_paused whether to wait for > > recovery to actually get paused or not. > > > > 3. Pass an option to pg_wal_replay_pause(), whether to wait for > > recovery pause or just request and return. > > > > I like the option 1, any other opinion on this? > > > > > Also, it could wait for recovery_min_apply_delay if it has a valid > value. It is possible > > > that a user set this parameter to a large value, so it could wait for > a long time. However, > > > this will be avoided by calling recoveryPausesHere() or > CheckAndSet
Re: Is Recovery actually paused?
On Sun, 17 Jan 2021 11:33:52 +0530 Dilip Kumar wrote: > On Thu, Jan 14, 2021 at 6:49 PM Yugo NAGATA wrote: > > > > On Wed, 13 Jan 2021 17:49:43 +0530 > > Dilip Kumar wrote: > > > > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar wrote: > > > > > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA wrote: > > > > > > > > > > On Thu, 10 Dec 2020 11:25:23 +0530 > > > > > Dilip Kumar wrote: > > > > > > > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to > > > > > > > > wait. > > > > > > > > Especially, if max_standby_streaming_delay is -1, this will be > > > > > > > > blocked forever, > > > > > > > > although this setting may not be usual. In addition, some users > > > > > > > > may set > > > > > > > > recovery_min_apply_delay for a large. If such users call > > > > > > > > pg_is_wal_replay_paused, > > > > > > > > it could wait for a long time. > > > > > > > > > > > > > > > > At least, I think we need some descriptions on document to > > > > > > > > explain > > > > > > > > pg_is_wal_replay_paused could wait while a time. > > > > > > > > > > > > > > Ok > > > > > > > > > > > > Fixed this, added some comments in .sgml as well as in function > > > > > > header > > > > > > > > > > Thank you for fixing this. > > > > > > > > > > Also, is it better to fix the description of pg_wal_replay_pause from > > > > > "Pauses recovery." to "Request to pause recovery." in according with > > > > > pg_is_wal_replay_paused? > > > > > > > > Okay > > > > > > > > > > > > > > > > > Also, how about adding a new boolean argument to > > > > > > > > pg_is_wal_replay_paused to > > > > > > > > control whether this waits for recovery to get paused or not? > > > > > > > > By setting its > > > > > > > > default value to true or false, users can use the old format > > > > > > > > for calling this > > > > > > > > and the backward compatibility can be maintained. > > > > > > > > > > > > > > So basically, if the wait_recovery_pause flag is false then we > > > > > > > will > > > > > > > immediately return true if the pause is requested? I agree that > > > > > > > it is > > > > > > > good to have an API to know whether the recovery pause is > > > > > > > requested or > > > > > > > not but I am not sure is it good idea to make this API serve both > > > > > > > the > > > > > > > purpose? Anyone else have any thoughts on this? > > > > > > > > > > > > > > > > > I think the current pg_is_wal_replay_paused() already has another > > > > > purpose; > > > > > this waits recovery to actually get paused. If we want to limit this > > > > > API's > > > > > purpose only to return the pause state, it seems better to fix this > > > > > to return > > > > > the actual state at the cost of lacking the backward compatibility. > > > > > If we want > > > > > to know whether pause is requested, we may add a new API like > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait > > > > > recovery to actually > > > > > get paused, we may add an option to pg_wal_replay_pause() for this > > > > > purpose. > > > > > > > > > > However, this might be a bikeshedding. If anyone don't care that > > > > > pg_is_wal_replay_paused() can make user wait for a long time, I don't > > > > > care either. > > > > > > > > I don't think that it will be blocked ever, because > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the > > > > recovery process will not be stuck on waiting for the WAL. > > > > Yes, there is no stuck on waiting for the WAL. However, it can be stuck > > during resolving > > a recovery conflict. The process could wait for max_standby_streaming_delay > > or > > max_standby_archive_delay at most before recovery get completely paused. > > Okay, I agree that it is possible so for handling this we have a > couple of options > 1. pg_is_wal_replay_paused(), interface will wait for recovery to > actually get paused, but user have an option to cancel that. So I > agree that there is currently no option to just know that recovery > pause is requested without waiting for its actually get paused if it > is requested. So one option is we can provide an another interface as > you mentioned pg_is_wal_replay_paluse_requeseted(), which can just > return the request status. I am not sure how useful it is. If it is acceptable that pg_is_wal_replay_paused() makes users wait, I'm ok for the current interface. I don't feel the need of pg_is_wal_replay_paluse_requeseted(). > > 2. Pass an option to pg_is_wal_replay_paused whether to wait for > recovery to actually get paused or not. > > 3. Pass an option to pg_wal_replay_pause(), whether to wait for > recovery pause or just request and return. > > I like the option 1, any other opinion on this? > > > Also, it could wait for recovery_min_apply_delay if it has a valid value. > > It is possible > > that a user set this parameter to a large value, so it could wait for a > > long time. However, > > this will be avoided by calling recoveryPausesHere()
Re: Is Recovery actually paused?
On Sun, Jan 17, 2021 at 1:48 PM Dilip Kumar wrote: > > On Sat, Jan 16, 2021 at 8:59 AM Masahiko Sawada wrote: > > > > On Wed, Jan 13, 2021 at 9:20 PM Dilip Kumar wrote: > > > > > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar wrote: > > > > > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA wrote: > > > > > > > > > > On Thu, 10 Dec 2020 11:25:23 +0530 > > > > > Dilip Kumar wrote: > > > > > > > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to > > > > > > > > wait. > > > > > > > > Especially, if max_standby_streaming_delay is -1, this will be > > > > > > > > blocked forever, > > > > > > > > although this setting may not be usual. In addition, some users > > > > > > > > may set > > > > > > > > recovery_min_apply_delay for a large. If such users call > > > > > > > > pg_is_wal_replay_paused, > > > > > > > > it could wait for a long time. > > > > > > > > > > > > > > > > At least, I think we need some descriptions on document to > > > > > > > > explain > > > > > > > > pg_is_wal_replay_paused could wait while a time. > > > > > > > > > > > > > > Ok > > > > > > > > > > > > Fixed this, added some comments in .sgml as well as in function > > > > > > header > > > > > > > > > > Thank you for fixing this. > > > > > > > > > > Also, is it better to fix the description of pg_wal_replay_pause from > > > > > "Pauses recovery." to "Request to pause recovery." in according with > > > > > pg_is_wal_replay_paused? > > > > > > > > Okay > > > > > > > > > > > > > > > > > Also, how about adding a new boolean argument to > > > > > > > > pg_is_wal_replay_paused to > > > > > > > > control whether this waits for recovery to get paused or not? > > > > > > > > By setting its > > > > > > > > default value to true or false, users can use the old format > > > > > > > > for calling this > > > > > > > > and the backward compatibility can be maintained. > > > > > > > > > > > > > > So basically, if the wait_recovery_pause flag is false then we > > > > > > > will > > > > > > > immediately return true if the pause is requested? I agree that > > > > > > > it is > > > > > > > good to have an API to know whether the recovery pause is > > > > > > > requested or > > > > > > > not but I am not sure is it good idea to make this API serve both > > > > > > > the > > > > > > > purpose? Anyone else have any thoughts on this? > > > > > > > > > > > > > > > > > I think the current pg_is_wal_replay_paused() already has another > > > > > purpose; > > > > > this waits recovery to actually get paused. If we want to limit this > > > > > API's > > > > > purpose only to return the pause state, it seems better to fix this > > > > > to return > > > > > the actual state at the cost of lacking the backward compatibility. > > > > > If we want > > > > > to know whether pause is requested, we may add a new API like > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait > > > > > recovery to actually > > > > > get paused, we may add an option to pg_wal_replay_pause() for this > > > > > purpose. > > > > > > > > > > However, this might be a bikeshedding. If anyone don't care that > > > > > pg_is_wal_replay_paused() can make user wait for a long time, I don't > > > > > care either. > > > > > > > > I don't think that it will be blocked ever, because > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the > > > > recovery process will not be stuck on waiting for the WAL. > > > > > > > > > > > > As another comment, while pg_is_wal_replay_paused is blocking, > > > > > > > > I can not cancel > > > > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the > > > > > > > > waiting loop. > > > > > > > > > > How about this fix? I think users may want to cancel > > > > > pg_is_wal_replay_paused() during > > > > > this is blocking. > > > > > > > > Yeah, we can do this. I will send the updated patch after putting > > > > some more thought into these comments. Thanks again for the feedback. > > > > > > > > > > Please find the updated patch. > > > > I've looked at the patch. Here are review comments: > > > > + /* Recovery pause state */ > > + RecoveryPauseState recoveryPause; > > > > Now that the value can have tri-state, how about renaming it to > > recoveryPauseState? > > This makes sense to me. > > > --- > > bool > > RecoveryIsPaused(void) > > +{ > > + boolrecoveryPause; > > + > > + SpinLockAcquire(&XLogCtl->info_lck); > > + recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED) ? > > true : false; > > + SpinLockRelease(&XLogCtl->info_lck); > > + > > + return recoveryPause; > > +} > > + > > +bool > > +RecoveryPauseRequested(void) > > { > > boolrecoveryPause; > > > > SpinLockAcquire(&XLogCtl->info_lck); > > - recoveryPause = XLogCtl->recoveryPause; > > + recoveryPause = (XLogCtl->recoveryPause != > > RECOVERY_IN_PROGRESS) ? true : false; > > SpinLockRelease(&XLogCtl->i
Re: Is Recovery actually paused?
On Sat, Jan 16, 2021 at 8:59 AM Masahiko Sawada wrote: > > On Wed, Jan 13, 2021 at 9:20 PM Dilip Kumar wrote: > > > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar wrote: > > > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA wrote: > > > > > > > > On Thu, 10 Dec 2020 11:25:23 +0530 > > > > Dilip Kumar wrote: > > > > > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to > > > > > > > wait. > > > > > > > Especially, if max_standby_streaming_delay is -1, this will be > > > > > > > blocked forever, > > > > > > > although this setting may not be usual. In addition, some users > > > > > > > may set > > > > > > > recovery_min_apply_delay for a large. If such users call > > > > > > > pg_is_wal_replay_paused, > > > > > > > it could wait for a long time. > > > > > > > > > > > > > > At least, I think we need some descriptions on document to explain > > > > > > > pg_is_wal_replay_paused could wait while a time. > > > > > > > > > > > > Ok > > > > > > > > > > Fixed this, added some comments in .sgml as well as in function header > > > > > > > > Thank you for fixing this. > > > > > > > > Also, is it better to fix the description of pg_wal_replay_pause from > > > > "Pauses recovery." to "Request to pause recovery." in according with > > > > pg_is_wal_replay_paused? > > > > > > Okay > > > > > > > > > > > > > > Also, how about adding a new boolean argument to > > > > > > > pg_is_wal_replay_paused to > > > > > > > control whether this waits for recovery to get paused or not? By > > > > > > > setting its > > > > > > > default value to true or false, users can use the old format for > > > > > > > calling this > > > > > > > and the backward compatibility can be maintained. > > > > > > > > > > > > So basically, if the wait_recovery_pause flag is false then we will > > > > > > immediately return true if the pause is requested? I agree that it > > > > > > is > > > > > > good to have an API to know whether the recovery pause is requested > > > > > > or > > > > > > not but I am not sure is it good idea to make this API serve both > > > > > > the > > > > > > purpose? Anyone else have any thoughts on this? > > > > > > > > > > > > > > I think the current pg_is_wal_replay_paused() already has another > > > > purpose; > > > > this waits recovery to actually get paused. If we want to limit this > > > > API's > > > > purpose only to return the pause state, it seems better to fix this to > > > > return > > > > the actual state at the cost of lacking the backward compatibility. If > > > > we want > > > > to know whether pause is requested, we may add a new API like > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery > > > > to actually > > > > get paused, we may add an option to pg_wal_replay_pause() for this > > > > purpose. > > > > > > > > However, this might be a bikeshedding. If anyone don't care that > > > > pg_is_wal_replay_paused() can make user wait for a long time, I don't > > > > care either. > > > > > > I don't think that it will be blocked ever, because > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the > > > recovery process will not be stuck on waiting for the WAL. > > > > > > > > > > As another comment, while pg_is_wal_replay_paused is blocking, I > > > > > > > can not cancel > > > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the > > > > > > > waiting loop. > > > > > > > > How about this fix? I think users may want to cancel > > > > pg_is_wal_replay_paused() during > > > > this is blocking. > > > > > > Yeah, we can do this. I will send the updated patch after putting > > > some more thought into these comments. Thanks again for the feedback. > > > > > > > Please find the updated patch. > > I've looked at the patch. Here are review comments: > > + /* Recovery pause state */ > + RecoveryPauseState recoveryPause; > > Now that the value can have tri-state, how about renaming it to > recoveryPauseState? This makes sense to me. > --- > bool > RecoveryIsPaused(void) > +{ > + boolrecoveryPause; > + > + SpinLockAcquire(&XLogCtl->info_lck); > + recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED) ? > true : false; > + SpinLockRelease(&XLogCtl->info_lck); > + > + return recoveryPause; > +} > + > +bool > +RecoveryPauseRequested(void) > { > boolrecoveryPause; > > SpinLockAcquire(&XLogCtl->info_lck); > - recoveryPause = XLogCtl->recoveryPause; > + recoveryPause = (XLogCtl->recoveryPause != > RECOVERY_IN_PROGRESS) ? true : false; > SpinLockRelease(&XLogCtl->info_lck); > > return recoveryPause; > } > > We can write like recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED); In RecoveryPauseRequested, we just want to know whether the pause is requested or not, even if the pause requested and not yet pause then also we want to return true. So how recoveryPause = (XLogCtl->recoveryPause ==
Re: Is Recovery actually paused?
On Thu, Jan 14, 2021 at 6:49 PM Yugo NAGATA wrote: > > On Wed, 13 Jan 2021 17:49:43 +0530 > Dilip Kumar wrote: > > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar wrote: > > > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA wrote: > > > > > > > > On Thu, 10 Dec 2020 11:25:23 +0530 > > > > Dilip Kumar wrote: > > > > > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to > > > > > > > wait. > > > > > > > Especially, if max_standby_streaming_delay is -1, this will be > > > > > > > blocked forever, > > > > > > > although this setting may not be usual. In addition, some users > > > > > > > may set > > > > > > > recovery_min_apply_delay for a large. If such users call > > > > > > > pg_is_wal_replay_paused, > > > > > > > it could wait for a long time. > > > > > > > > > > > > > > At least, I think we need some descriptions on document to explain > > > > > > > pg_is_wal_replay_paused could wait while a time. > > > > > > > > > > > > Ok > > > > > > > > > > Fixed this, added some comments in .sgml as well as in function header > > > > > > > > Thank you for fixing this. > > > > > > > > Also, is it better to fix the description of pg_wal_replay_pause from > > > > "Pauses recovery." to "Request to pause recovery." in according with > > > > pg_is_wal_replay_paused? > > > > > > Okay > > > > > > > > > > > > > > Also, how about adding a new boolean argument to > > > > > > > pg_is_wal_replay_paused to > > > > > > > control whether this waits for recovery to get paused or not? By > > > > > > > setting its > > > > > > > default value to true or false, users can use the old format for > > > > > > > calling this > > > > > > > and the backward compatibility can be maintained. > > > > > > > > > > > > So basically, if the wait_recovery_pause flag is false then we will > > > > > > immediately return true if the pause is requested? I agree that it > > > > > > is > > > > > > good to have an API to know whether the recovery pause is requested > > > > > > or > > > > > > not but I am not sure is it good idea to make this API serve both > > > > > > the > > > > > > purpose? Anyone else have any thoughts on this? > > > > > > > > > > > > > > I think the current pg_is_wal_replay_paused() already has another > > > > purpose; > > > > this waits recovery to actually get paused. If we want to limit this > > > > API's > > > > purpose only to return the pause state, it seems better to fix this to > > > > return > > > > the actual state at the cost of lacking the backward compatibility. If > > > > we want > > > > to know whether pause is requested, we may add a new API like > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery > > > > to actually > > > > get paused, we may add an option to pg_wal_replay_pause() for this > > > > purpose. > > > > > > > > However, this might be a bikeshedding. If anyone don't care that > > > > pg_is_wal_replay_paused() can make user wait for a long time, I don't > > > > care either. > > > > > > I don't think that it will be blocked ever, because > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the > > > recovery process will not be stuck on waiting for the WAL. > > Yes, there is no stuck on waiting for the WAL. However, it can be stuck > during resolving > a recovery conflict. The process could wait for max_standby_streaming_delay or > max_standby_archive_delay at most before recovery get completely paused. Okay, I agree that it is possible so for handling this we have a couple of options 1. pg_is_wal_replay_paused(), interface will wait for recovery to actually get paused, but user have an option to cancel that. So I agree that there is currently no option to just know that recovery pause is requested without waiting for its actually get paused if it is requested. So one option is we can provide an another interface as you mentioned pg_is_wal_replay_paluse_requeseted(), which can just return the request status. I am not sure how useful it is. 2. Pass an option to pg_is_wal_replay_paused whether to wait for recovery to actually get paused or not. 3. Pass an option to pg_wal_replay_pause(), whether to wait for recovery pause or just request and return. I like the option 1, any other opinion on this? > Also, it could wait for recovery_min_apply_delay if it has a valid value. It > is possible > that a user set this parameter to a large value, so it could wait for a long > time. However, > this will be avoided by calling recoveryPausesHere() or > CheckAndSetRecoveryPause() in > recoveryApplyDelay(). Right > > > > > > > As another comment, while pg_is_wal_replay_paused is blocking, I > > > > > > > can not cancel > > > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the > > > > > > > waiting loop. > > > > > > > > How about this fix? I think users may want to cancel > > > > pg_is_wal_replay_paused() during > > > > this is blocking. > > > > > > Yeah, we can do this. I will send the updated patch aft
Re: Is Recovery actually paused?
On Sun, Jan 17, 2021 at 3:52 AM Masahiko Sawada wrote: > > On Sat, Jan 16, 2021 at 12:28 PM Masahiko Sawada > wrote: > > > > --- > > + /* test for recovery pause if user has requested the pause */ > > + if (((volatile XLogCtlData *) XLogCtl)->recoveryPause) > > + recoveryPausesHere(false); > > + > > + now = GetCurrentTimestamp(); > > + > > > > Hmm, if the recovery pauses here, the wal receiver isn't launched even > > when wal_retrieve_retry_interval has passed, which seems not good. I > > think we want the recovery to be paused but want the wal receiver to > > continue receiving WAL. > > I had misunderstood the code and the patch, please ignore this > comment. Pausing the recovery here is fine with me. Thanks for the review Sawada-San, I will work on your other comments and post the patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Is Recovery actually paused?
On Sat, Jan 16, 2021 at 12:28 PM Masahiko Sawada wrote: > > --- > + /* test for recovery pause if user has requested the pause */ > + if (((volatile XLogCtlData *) XLogCtl)->recoveryPause) > + recoveryPausesHere(false); > + > + now = GetCurrentTimestamp(); > + > > Hmm, if the recovery pauses here, the wal receiver isn't launched even > when wal_retrieve_retry_interval has passed, which seems not good. I > think we want the recovery to be paused but want the wal receiver to > continue receiving WAL. I had misunderstood the code and the patch, please ignore this comment. Pausing the recovery here is fine with me. Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
Re: Is Recovery actually paused?
On Wed, Jan 13, 2021 at 9:20 PM Dilip Kumar wrote: > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar wrote: > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA wrote: > > > > > > On Thu, 10 Dec 2020 11:25:23 +0530 > > > Dilip Kumar wrote: > > > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to > > > > > > wait. > > > > > > Especially, if max_standby_streaming_delay is -1, this will be > > > > > > blocked forever, > > > > > > although this setting may not be usual. In addition, some users may > > > > > > set > > > > > > recovery_min_apply_delay for a large. If such users call > > > > > > pg_is_wal_replay_paused, > > > > > > it could wait for a long time. > > > > > > > > > > > > At least, I think we need some descriptions on document to explain > > > > > > pg_is_wal_replay_paused could wait while a time. > > > > > > > > > > Ok > > > > > > > > Fixed this, added some comments in .sgml as well as in function header > > > > > > Thank you for fixing this. > > > > > > Also, is it better to fix the description of pg_wal_replay_pause from > > > "Pauses recovery." to "Request to pause recovery." in according with > > > pg_is_wal_replay_paused? > > > > Okay > > > > > > > > > > > Also, how about adding a new boolean argument to > > > > > > pg_is_wal_replay_paused to > > > > > > control whether this waits for recovery to get paused or not? By > > > > > > setting its > > > > > > default value to true or false, users can use the old format for > > > > > > calling this > > > > > > and the backward compatibility can be maintained. > > > > > > > > > > So basically, if the wait_recovery_pause flag is false then we will > > > > > immediately return true if the pause is requested? I agree that it is > > > > > good to have an API to know whether the recovery pause is requested or > > > > > not but I am not sure is it good idea to make this API serve both the > > > > > purpose? Anyone else have any thoughts on this? > > > > > > > > > > > I think the current pg_is_wal_replay_paused() already has another purpose; > > > this waits recovery to actually get paused. If we want to limit this API's > > > purpose only to return the pause state, it seems better to fix this to > > > return > > > the actual state at the cost of lacking the backward compatibility. If we > > > want > > > to know whether pause is requested, we may add a new API like > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery > > > to actually > > > get paused, we may add an option to pg_wal_replay_pause() for this > > > purpose. > > > > > > However, this might be a bikeshedding. If anyone don't care that > > > pg_is_wal_replay_paused() can make user wait for a long time, I don't > > > care either. > > > > I don't think that it will be blocked ever, because > > pg_wal_replay_pause is sending the WakeupRecovery() which means the > > recovery process will not be stuck on waiting for the WAL. > > > > > > > > As another comment, while pg_is_wal_replay_paused is blocking, I > > > > > > can not cancel > > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the > > > > > > waiting loop. > > > > > > How about this fix? I think users may want to cancel > > > pg_is_wal_replay_paused() during > > > this is blocking. > > > > Yeah, we can do this. I will send the updated patch after putting > > some more thought into these comments. Thanks again for the feedback. > > > > Please find the updated patch. I've looked at the patch. Here are review comments: + /* Recovery pause state */ + RecoveryPauseState recoveryPause; Now that the value can have tri-state, how about renaming it to recoveryPauseState? --- bool RecoveryIsPaused(void) +{ + boolrecoveryPause; + + SpinLockAcquire(&XLogCtl->info_lck); + recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED) ? true : false; + SpinLockRelease(&XLogCtl->info_lck); + + return recoveryPause; +} + +bool +RecoveryPauseRequested(void) { boolrecoveryPause; SpinLockAcquire(&XLogCtl->info_lck); - recoveryPause = XLogCtl->recoveryPause; + recoveryPause = (XLogCtl->recoveryPause != RECOVERY_IN_PROGRESS) ? true : false; SpinLockRelease(&XLogCtl->info_lck); return recoveryPause; } We can write like recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED); Also, since these functions do the almost same thing, I think we can have a common function to get XLogCtl->recoveryPause, say GetRecoveryPauseState() or GetRecoveryPause(), and both RecoveryIsPaused() and RecoveryPauseRequested() use the returned value. What do you think? --- +static void +CheckAndSetRecoveryPause(void) Maybe we need to declare the prototype of this function like other functions in xlog.c. --- + /* +* If recovery is not in progress anymore then report an error this +* could happen if the standby is promoted while we were waiting for +*
Re: Is Recovery actually paused?
On Wed, 13 Jan 2021 17:49:43 +0530 Dilip Kumar wrote: > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar wrote: > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA wrote: > > > > > > On Thu, 10 Dec 2020 11:25:23 +0530 > > > Dilip Kumar wrote: > > > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to > > > > > > wait. > > > > > > Especially, if max_standby_streaming_delay is -1, this will be > > > > > > blocked forever, > > > > > > although this setting may not be usual. In addition, some users may > > > > > > set > > > > > > recovery_min_apply_delay for a large. If such users call > > > > > > pg_is_wal_replay_paused, > > > > > > it could wait for a long time. > > > > > > > > > > > > At least, I think we need some descriptions on document to explain > > > > > > pg_is_wal_replay_paused could wait while a time. > > > > > > > > > > Ok > > > > > > > > Fixed this, added some comments in .sgml as well as in function header > > > > > > Thank you for fixing this. > > > > > > Also, is it better to fix the description of pg_wal_replay_pause from > > > "Pauses recovery." to "Request to pause recovery." in according with > > > pg_is_wal_replay_paused? > > > > Okay > > > > > > > > > > > Also, how about adding a new boolean argument to > > > > > > pg_is_wal_replay_paused to > > > > > > control whether this waits for recovery to get paused or not? By > > > > > > setting its > > > > > > default value to true or false, users can use the old format for > > > > > > calling this > > > > > > and the backward compatibility can be maintained. > > > > > > > > > > So basically, if the wait_recovery_pause flag is false then we will > > > > > immediately return true if the pause is requested? I agree that it is > > > > > good to have an API to know whether the recovery pause is requested or > > > > > not but I am not sure is it good idea to make this API serve both the > > > > > purpose? Anyone else have any thoughts on this? > > > > > > > > > > > I think the current pg_is_wal_replay_paused() already has another purpose; > > > this waits recovery to actually get paused. If we want to limit this API's > > > purpose only to return the pause state, it seems better to fix this to > > > return > > > the actual state at the cost of lacking the backward compatibility. If we > > > want > > > to know whether pause is requested, we may add a new API like > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery > > > to actually > > > get paused, we may add an option to pg_wal_replay_pause() for this > > > purpose. > > > > > > However, this might be a bikeshedding. If anyone don't care that > > > pg_is_wal_replay_paused() can make user wait for a long time, I don't > > > care either. > > > > I don't think that it will be blocked ever, because > > pg_wal_replay_pause is sending the WakeupRecovery() which means the > > recovery process will not be stuck on waiting for the WAL. Yes, there is no stuck on waiting for the WAL. However, it can be stuck during resolving a recovery conflict. The process could wait for max_standby_streaming_delay or max_standby_archive_delay at most before recovery get completely paused. Also, it could wait for recovery_min_apply_delay if it has a valid value. It is possible that a user set this parameter to a large value, so it could wait for a long time. However, this will be avoided by calling recoveryPausesHere() or CheckAndSetRecoveryPause() in recoveryApplyDelay(). > > > > > > As another comment, while pg_is_wal_replay_paused is blocking, I > > > > > > can not cancel > > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the > > > > > > waiting loop. > > > > > > How about this fix? I think users may want to cancel > > > pg_is_wal_replay_paused() during > > > this is blocking. > > > > Yeah, we can do this. I will send the updated patch after putting > > some more thought into these comments. Thanks again for the feedback. > > > > Please find the updated patch. Thanks. I confirmed that I can cancel pg_is_wal_repaly_paused() during stuck. Although it is a very trivial comment, I think that the new line before HandleStartupProcInterrupts() is unnecessary. @@ -6052,12 +6062,20 @@ recoveryPausesHere(bool endOfRecovery) (errmsg("recovery has paused"), errhint("Execute pg_wal_replay_resume() to continue."))); - while (RecoveryIsPaused()) + while (RecoveryPauseRequested()) { + HandleStartupProcInterrupts(); Regards, Yugo Nagata -- Yugo NAGATA