Re: Is Recovery actually paused?

2021-03-11 Thread Dilip Kumar
On Fri, 12 Mar 2021 at 2:04 AM, Robert Haas  wrote:

> On Mon, Mar 1, 2021 at 12:08 AM Dilip Kumar  wrote:
> > > > > I'll wait for a day before marking this RfC in case anyone have
> > > > > further comments.
> > > >
> > > > Okay.
> > >
> > > Hearing nothing, done that.
> >
> > Thanks.
>
> Committed with minor cosmetic changes.


Thanks Robert.

> --
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: Is Recovery actually paused?

2021-03-11 Thread Robert Haas
On Mon, Mar 1, 2021 at 12:08 AM Dilip Kumar  wrote:
> > > > I'll wait for a day before marking this RfC in case anyone have
> > > > further comments.
> > >
> > > Okay.
> >
> > Hearing nothing, done that.
>
> Thanks.

Committed with minor cosmetic changes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-28 Thread Dilip Kumar
On Fri, Feb 26, 2021 at 1:33 PM Kyotaro Horiguchi
 wrote:
>
> At Thu, 25 Feb 2021 13:22:53 +0530, Dilip Kumar  wrote 
> in
> > On Thu, Feb 25, 2021 at 12:42 PM Kyotaro Horiguchi
> >  wrote:
> > > The latest version applies (almost) cleanly to the current master and
> > > works fine.
> > > I don't have further comment on this.
> > >
> > > I'll wait for a day before marking this RfC in case anyone have
> > > further comments.
> >
> > Okay.
>
> Hearing nothing, done that.

Thanks.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-26 Thread Kyotaro Horiguchi
At Thu, 25 Feb 2021 13:22:53 +0530, Dilip Kumar  wrote 
in 
> On Thu, Feb 25, 2021 at 12:42 PM Kyotaro Horiguchi
>  wrote:
> > The latest version applies (almost) cleanly to the current master and
> > works fine.
> > I don't have further comment on this.
> >
> > I'll wait for a day before marking this RfC in case anyone have
> > further comments.
> 
> Okay.

Hearing nothing, done that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Is Recovery actually paused?

2021-02-24 Thread Dilip Kumar
On Thu, Feb 25, 2021 at 12:42 PM Kyotaro Horiguchi
 wrote:
>
> Thanks for your patience and sorry for having annoyed you.

Thank you very much for your review and inputs.

> The latest version applies (almost) cleanly to the current master and
> works fine.
> I don't have further comment on this.
>
> I'll wait for a day before marking this RfC in case anyone have
> further comments.

Okay.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-24 Thread Kyotaro Horiguchi
At Thu, 25 Feb 2021 09:49:15 +0530, Dilip Kumar  wrote 
in 
> On Thu, Feb 25, 2021 at 6:52 AM Kyotaro Horiguchi
>  wrote:
> 
> > Recently we have mildly changed to the direction to utilize the
> > compiler warning about enum coverage in switch struct. (Maybe we need
> > another compiler option that enables that check for switch'es with the
> > default case, though.) In that light, the direction is a switch
> > without the default case then Assert if none of the cases is stepped
> > on. This is what apply_dispatch does. Slightly different version of
> > the same would be the following.  This is more natural than the above.
> >
> >   statestr = NULL;
> >   swtich(state)
> >   {
> > case RECOVERY_NOT_PAUSED:
> >   statestr = "not paused";
> >   break;
> > ...
> >   }
> >
> >   Assert (statestr != NULL);
> >   return cstring_to_text(statestr);
> >
> > If the enum had many (more than ten or so?) values and it didn't seem
> > stable I push that a bit strongly but it actually consists of only
> > three values and not likely to get further values.  So I don't insist
> > on the style so strongly here.
> >
> 
> Changed as per the suggestion.

Thanks for your patience and sorry for having annoyed you.

The latest version applies (almost) cleanly to the current master and
works fine.
I don't have further comment on this.

I'll wait for a day before marking this RfC in case anyone have
further comments.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Is Recovery actually paused?

2021-02-24 Thread Dilip Kumar
On Thu, Feb 25, 2021 at 6:52 AM Kyotaro Horiguchi
 wrote:

> Recently we have mildly changed to the direction to utilize the
> compiler warning about enum coverage in switch struct. (Maybe we need
> another compiler option that enables that check for switch'es with the
> default case, though.) In that light, the direction is a switch
> without the default case then Assert if none of the cases is stepped
> on. This is what apply_dispatch does. Slightly different version of
> the same would be the following.  This is more natural than the above.
>
>   statestr = NULL;
>   swtich(state)
>   {
> case RECOVERY_NOT_PAUSED:
>   statestr = "not paused";
>   break;
> ...
>   }
>
>   Assert (statestr != NULL);
>   return cstring_to_text(statestr);
>
> If the enum had many (more than ten or so?) values and it didn't seem
> stable I push that a bit strongly but it actually consists of only
> three values and not likely to get further values.  So I don't insist
> on the style so strongly here.
>

Changed as per the suggestion.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From 3a5848b49a808ed5cde0f1d38bcaf0bba112cc54 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Wed, 27 Jan 2021 16:46:04 +0530
Subject: [PATCH v16] Provide a new interface to get the recovery pause status

Currently, pg_is_wal_replay_paused, just checks whether the recovery
pause is requested or not but it doesn't actually tell whether the
recovery is actually paused or not.  So basically there is no way for
the user to know the actual status of the pause request.  This patch
provides a new interface pg_get_wal_replay_pause_state that will
return the actual status of the recovery pause i.e.'not paused' if
pause is not requested, 'pause requested' if pause is requested but
recovery is not yet paused and 'paused' if recovery is actually paused.
---
 doc/src/sgml/func.sgml | 32 +++--
 src/backend/access/transam/xlog.c  | 86 --
 src/backend/access/transam/xlogfuncs.c | 45 +-
 src/include/access/xlog.h  | 10 +++-
 src/include/catalog/pg_proc.dat|  4 ++
 5 files changed, 155 insertions(+), 22 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index d822427..500bc9b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25285,7 +25285,24 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 boolean


-Returns true if recovery is paused.
+Returns true if recovery pause is requested.
+   
+  
+
+  
+   
+
+ pg_get_wal_replay_pause_state
+
+pg_get_wal_replay_pause_state ()
+text
+   
+   
+Returns recovery pause state.  The return values are 
+not paused if pause is not requested, 
+pause requested if pause is requested but recovery is
+not yet paused and, paused if the recovery is
+actually paused.

   
 
@@ -25324,10 +25341,15 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 void


-Pauses recovery.  While recovery is paused, no further database
-changes are applied.  If hot standby is active, all new queries will
-see the same consistent snapshot of the database, and no further query
-conflicts will be generated until recovery is resumed.
+Request to pause recovery.  A request doesn't mean that recovery stops
+right away.  If you want a guarantee that recovery is actually paused,
+you need to check for the recovery pause state returned by
+pg_get_wal_replay_pause_state().  Note that
+pg_is_wal_replay_paused() returns whether a request
+is made.  While recovery is paused, no further database changes are applied.
+If hot standby is active, all new queries will see the same consistent
+snapshot of the database, and no further query conflicts will be generated
+until recovery is resumed.


 This function is restricted to superusers by default, but other users
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e0c37f7..5c94be7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -721,8 +721,8 @@ typedef struct XLogCtlData
 	 * only relevant for replication or archive recovery
 	 */
 	TimestampTz currentChunkStartTime;
-	/* Are we requested to pause recovery? */
-	bool		recoveryPause;
+	/* Recovery pause state */
+	RecoveryPauseState	recoveryPauseState;
 
 	/*
 	 * lastFpwDisableRecPtr points to the start of the last replayed
@@ -894,6 +894,7 @@ static void validateRecoveryParameters(void);
 static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
 static bool recoveryStopsBefore(XLogReaderState *record);
 static bool recoveryStopsAfter(XLogReaderState *record);

Re: Is Recovery actually paused?

2021-02-24 Thread Kyotaro Horiguchi
At Wed, 24 Feb 2021 15:25:57 +0530, Dilip Kumar  wrote 
in 
> On Wed, Feb 24, 2021 at 2:26 PM Kyotaro Horiguchi
>  wrote:
> > I should have took the meaning of "confirm" wrongly.  I took that as
> > "somehow determine if the recovery is to be paused".  If that reading
> > is completely wrong, I don't mind either re-chaging the function name
> > or leaving all it alone.
> 
> I am fine with leaving it the way it is unless someone feels that we
> should change it.

Understood. I don't stick to the change.

> > > > This disables the static enum coverage check and it is not likely to
> > > > have a wrong value here, other than the case of shared memory
> > > > corruption.  So we can remove the default case
> > > > here. pg_get_replication_slots() is going that direction and
> > > > apply_dispatch() is taking a slightly different way.  Anyway I think
> > > > that we can take away the default case.
> > >
> > > So do you think we should put an assert(0) in the default case?
> >
> > No.  Just removing the default in the switch.  If the value comes from
> > some other source typically from disk or user-interraction, the
> > default is necessary, but, in the first place if we have other than
> > the defined value there, it is a sign of something worse than
> > ERROR. If we care about that case, we *could* do the same thing with
> > apply_dispatch().
> >
> >   switch (GetRecoveryPauseState())
> >   {
> >  case RECOVERY_NOT_PAUSED:
> >  return cstring_to_text("not paused");
> >  ..
> >}
> >
> >/* we shouldn't reach here */
> >Assert (0);
> 
> I think for such cases IMHO the preferred style for PostgreSQL is that
> we add Assert(0) in the default case,  at least it appeared to me that
> way.

Recently we have mildly changed to the direction to utilize the
compiler warning about enum coverage in switch struct. (Maybe we need
another compiler option that enables that check for switch'es with the
default case, though.) In that light, the direction is a switch
without the default case then Assert if none of the cases is stepped
on. This is what apply_dispatch does. Slightly different version of
the same would be the following.  This is more natural than the above.

  statestr = NULL;
  swtich(state)
  {
case RECOVERY_NOT_PAUSED:
  statestr = "not paused";
  break;
...
  }
  
  Assert (statestr != NULL);
  return cstring_to_text(statestr);

If the enum had many (more than ten or so?) values and it didn't seem
stable I push that a bit strongly but it actually consists of only
three values and not likely to get further values.  So I don't insist
on the style so strongly here.

In short, I'm fine with default: Assert(0) if you still don't like the
just above.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Is Recovery actually paused?

2021-02-24 Thread Dilip Kumar
On Wed, Feb 24, 2021 at 3:25 PM Dilip Kumar  wrote:
>

> > > > The reason for the checkpoint is to move to "paused" state in a
> > > > reasonable time.  I think we need to mention that reason rather than
> > > > what is done here.
> > >
> > > I will do that.

I have fixed this.

> > > >
> > > > +   /* get the recovery pause state */
> > > > +   switch(GetRecoveryPauseState())
> > > > +   {
> > > > +   case RECOVERY_NOT_PAUSED:
> > > > +   state = "not paused";
> > > > +   break;
> > > > ...
> > > > +   default:
> > > > +   elog(ERROR, "invalid recovery pause state");

>
> I think for such cases IMHO the preferred style for PostgreSQL is that
> we add Assert(0) in the default case,  at least it appeared to me that
> way.

Added an Assert(0) in default case.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From 5eef4c3d7e2aa3c2858031566f2da3b0075e361a Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Wed, 27 Jan 2021 16:46:04 +0530
Subject: [PATCH v15] Provide a new interface to get the recovery pause status

Currently, pg_is_wal_replay_paused, just checks whether the recovery
pause is requested or not but it doesn't actually tell whether the
recovery is actually paused or not.  So basically there is no way for
the user to know the actual status of the pause request.  This patch
provides a new interface pg_get_wal_replay_pause_state that will
return the actual status of the recovery pause i.e.'not paused' if
pause is not requested, 'pause requested' if pause is requested but
recovery is not yet paused and 'paused' if recovery is actually paused.
---
 doc/src/sgml/func.sgml | 32 +++--
 src/backend/access/transam/xlog.c  | 86 --
 src/backend/access/transam/xlogfuncs.c | 47 ++-
 src/include/access/xlog.h  | 10 +++-
 src/include/catalog/pg_proc.dat|  4 ++
 5 files changed, 157 insertions(+), 22 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index d822427..500bc9b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25285,7 +25285,24 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 boolean


-Returns true if recovery is paused.
+Returns true if recovery pause is requested.
+   
+  
+
+  
+   
+
+ pg_get_wal_replay_pause_state
+
+pg_get_wal_replay_pause_state ()
+text
+   
+   
+Returns recovery pause state.  The return values are 
+not paused if pause is not requested, 
+pause requested if pause is requested but recovery is
+not yet paused and, paused if the recovery is
+actually paused.

   
 
@@ -25324,10 +25341,15 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 void


-Pauses recovery.  While recovery is paused, no further database
-changes are applied.  If hot standby is active, all new queries will
-see the same consistent snapshot of the database, and no further query
-conflicts will be generated until recovery is resumed.
+Request to pause recovery.  A request doesn't mean that recovery stops
+right away.  If you want a guarantee that recovery is actually paused,
+you need to check for the recovery pause state returned by
+pg_get_wal_replay_pause_state().  Note that
+pg_is_wal_replay_paused() returns whether a request
+is made.  While recovery is paused, no further database changes are applied.
+If hot standby is active, all new queries will see the same consistent
+snapshot of the database, and no further query conflicts will be generated
+until recovery is resumed.


 This function is restricted to superusers by default, but other users
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e0c37f7..5c94be7 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -721,8 +721,8 @@ typedef struct XLogCtlData
 	 * only relevant for replication or archive recovery
 	 */
 	TimestampTz currentChunkStartTime;
-	/* Are we requested to pause recovery? */
-	bool		recoveryPause;
+	/* Recovery pause state */
+	RecoveryPauseState	recoveryPauseState;
 
 	/*
 	 * lastFpwDisableRecPtr points to the start of the last replayed
@@ -894,6 +894,7 @@ static void validateRecoveryParameters(void);
 static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
 static bool recoveryStopsBefore(XLogReaderState *record);
 static bool recoveryStopsAfter(XLogReaderState *record);
+static void ConfirmRecoveryPaused(void);
 static void recoveryPausesHere(bool endOfRecovery);
 static bool recoveryApplyDelay(XLogReaderState *record);
 static void 

Re: Is Recovery actually paused?

2021-02-24 Thread Dilip Kumar
On Wed, Feb 24, 2021 at 2:26 PM Kyotaro Horiguchi
 wrote:
>
> At Wed, 24 Feb 2021 13:15:27 +0530, Dilip Kumar  wrote 
> in
> > On Wed, Feb 24, 2021 at 12:39 PM Kyotaro Horiguchi
> >  wrote:
> > >
> > > At Tue, 23 Feb 2021 12:03:32 +0530, Dilip Kumar  
> > > wrote in
> > > > On Fri, Feb 12, 2021 at 3:26 AM Robert Haas  
> > > > wrote:
> > > How about something like this?
> > >
> > > Request to pause recovery. Server actually stops recovery at a
> > > convenient time. This can take a few seconds after the request. If you
> > > need to strictly guarantee that no further database change will occur,
> > > you can check using pg_get_wal_replay_ause_state(). Note that
> > > pg_is_wal_replay_paused() may return true before recovery actually
> > > stops.
> >
> > I still think that for the user-facing documentation purpose the
> > current paragraph looks better.
>
> Ok.
>
> > > The patch adds two loops whth the following logic:
> > >
> > >  while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)
> > >  {
> > >...
> > >ConfirmRecoveryPaused();
> > >
> > >  }
> > >
> > > After the renaming of the function, the following structure looks
> > > simpler and more natural.
> > >
> > >  while (ConfirmRecoveryPaused())
> > >  {
> > >...
> > >
> > >  }
> >
> > So do you mean that if the pause is requested ConfirmRecoveryPaused
> > will set it to paused and if it is not paused then it will return
> > false?  With the current function name, I don't think that will look
> > clean maybe we should change the name to something like
> > CheckAndConfirmRecoveryPaused?  Or I am fine with the way it is now.
> > Any other thoughts?
>
> I should have took the meaning of "confirm" wrongly.  I took that as
> "somehow determine if the recovery is to be paused".  If that reading
> is completely wrong, I don't mind either re-chaging the function name
> or leaving all it alone.

I am fine with leaving it the way it is unless someone feels that we
should change it.

> > >
> > > +   /* test for recovery pause, if user has requested the 
> > > pause */
> > > +   if (((volatile XLogCtlData *) 
> > > XLogCtl)->recoveryPauseState !=
> > >
> > > The reason for the checkpoint is to move to "paused" state in a
> > > reasonable time.  I think we need to mention that reason rather than
> > > what is done here.
> >
> > I will do that.
>
> Thanks.
>
> > >
> > > +   /* get the recovery pause state */
> > > +   switch(GetRecoveryPauseState())
> > > +   {
> > > +   case RECOVERY_NOT_PAUSED:
> > > +   state = "not paused";
> > > +   break;
> > > ...
> > > +   default:
> > > +   elog(ERROR, "invalid recovery pause state");
> > >
> > > This disables the static enum coverage check and it is not likely to
> > > have a wrong value here, other than the case of shared memory
> > > corruption.  So we can remove the default case
> > > here. pg_get_replication_slots() is going that direction and
> > > apply_dispatch() is taking a slightly different way.  Anyway I think
> > > that we can take away the default case.
> >
> > So do you think we should put an assert(0) in the default case?
>
> No.  Just removing the default in the switch.  If the value comes from
> some other source typically from disk or user-interraction, the
> default is necessary, but, in the first place if we have other than
> the defined value there, it is a sign of something worse than
> ERROR. If we care about that case, we *could* do the same thing with
> apply_dispatch().
>
>   switch (GetRecoveryPauseState())
>   {
>  case RECOVERY_NOT_PAUSED:
>  return cstring_to_text("not paused");
>  ..
>}
>
>/* we shouldn't reach here */
>Assert (0);

I think for such cases IMHO the preferred style for PostgreSQL is that
we add Assert(0) in the default case,  at least it appeared to me that
way.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-24 Thread Kyotaro Horiguchi
At Wed, 24 Feb 2021 17:56:41 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Wed, 24 Feb 2021 13:15:27 +0530, Dilip Kumar  wrote 
> in 
> > > After the renaming of the function, the following structure looks
> > > simpler and more natural.
> > >
> > >  while (ConfirmRecoveryPaused())
> > >  {
> > >...
> > >
> > >  }
> > 
> > So do you mean that if the pause is requested ConfirmRecoveryPaused
> > will set it to paused and if it is not paused then it will return
> > false?  With the current function name, I don't think that will look
> > clean maybe we should change the name to something like
> > CheckAndConfirmRecoveryPaused?  Or I am fine with the way it is now.
> > Any other thoughts?
> 
> I should have took the meaning of "confirm" wrongly.  I took that as
> "somehow determine if the recovery is to be paused".  If that reading
> is completely wrong, I don't mind either re-chaging the function name
> or leaving all it alone.

Ouch. If we choose to re-rename it, it won't be "CheckAnd...".
RecoveryIsPaused() is used for another meaning.  Maybe
RecoveryPauseTriggerd() or such?  (I'm not sure, sorry..)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Is Recovery actually paused?

2021-02-24 Thread Kyotaro Horiguchi
At Wed, 24 Feb 2021 13:15:27 +0530, Dilip Kumar  wrote 
in 
> On Wed, Feb 24, 2021 at 12:39 PM Kyotaro Horiguchi
>  wrote:
> >
> > At Tue, 23 Feb 2021 12:03:32 +0530, Dilip Kumar  
> > wrote in
> > > On Fri, Feb 12, 2021 at 3:26 AM Robert Haas  wrote:
> > How about something like this?
> >
> > Request to pause recovery. Server actually stops recovery at a
> > convenient time. This can take a few seconds after the request. If you
> > need to strictly guarantee that no further database change will occur,
> > you can check using pg_get_wal_replay_ause_state(). Note that
> > pg_is_wal_replay_paused() may return true before recovery actually
> > stops.
> 
> I still think that for the user-facing documentation purpose the
> current paragraph looks better.

Ok.

> > The patch adds two loops whth the following logic:
> >
> >  while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)
> >  {
> >...
> >ConfirmRecoveryPaused();
> >
> >  }
> >
> > After the renaming of the function, the following structure looks
> > simpler and more natural.
> >
> >  while (ConfirmRecoveryPaused())
> >  {
> >...
> >
> >  }
> 
> So do you mean that if the pause is requested ConfirmRecoveryPaused
> will set it to paused and if it is not paused then it will return
> false?  With the current function name, I don't think that will look
> clean maybe we should change the name to something like
> CheckAndConfirmRecoveryPaused?  Or I am fine with the way it is now.
> Any other thoughts?

I should have took the meaning of "confirm" wrongly.  I took that as
"somehow determine if the recovery is to be paused".  If that reading
is completely wrong, I don't mind either re-chaging the function name
or leaving all it alone.

> >
> > +   /* test for recovery pause, if user has requested the pause 
> > */
> > +   if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState 
> > !=
> >
> > The reason for the checkpoint is to move to "paused" state in a
> > reasonable time.  I think we need to mention that reason rather than
> > what is done here.
> 
> I will do that.

Thanks.

> >
> > +   /* get the recovery pause state */
> > +   switch(GetRecoveryPauseState())
> > +   {
> > +   case RECOVERY_NOT_PAUSED:
> > +   state = "not paused";
> > +   break;
> > ...
> > +   default:
> > +   elog(ERROR, "invalid recovery pause state");
> >
> > This disables the static enum coverage check and it is not likely to
> > have a wrong value here, other than the case of shared memory
> > corruption.  So we can remove the default case
> > here. pg_get_replication_slots() is going that direction and
> > apply_dispatch() is taking a slightly different way.  Anyway I think
> > that we can take away the default case.
> 
> So do you think we should put an assert(0) in the default case?

No.  Just removing the default in the switch.  If the value comes from
some other source typically from disk or user-interraction, the
default is necessary, but, in the first place if we have other than
the defined value there, it is a sign of something worse than
ERROR. If we care about that case, we *could* do the same thing with
apply_dispatch().

  switch (GetRecoveryPauseState())
  {
 case RECOVERY_NOT_PAUSED:
 return cstring_to_text("not paused");
 ..
   }

   /* we shouldn't reach here */
   Assert (0);

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Is Recovery actually paused?

2021-02-23 Thread Dilip Kumar
On Wed, Feb 24, 2021 at 12:39 PM Kyotaro Horiguchi
 wrote:
>
> At Tue, 23 Feb 2021 12:03:32 +0530, Dilip Kumar  wrote 
> in
> > On Fri, Feb 12, 2021 at 3:26 AM Robert Haas  wrote:
> > > There might be some more to say here, but those are things I notice on
> > > a first read-through.
> >
> > Okay.
>
> It seems to me all the suggestions are addressed in this version.
>
> +Request to pause recovery.  A request doesn't mean that recovery 
> stops
> +right away.  If you want a guarantee that recovery is actually 
> paused,
> +you need to check for the recovery pause state returned by
> +pg_get_wal_replay_pause_state().  Note that
> +pg_is_wal_replay_paused() returns whether a 
> request
> +is made.  While recovery is paused, no further database changes are 
> applied.
>
> This looks like explainig the same thing twice. ("A request doesn't
> mean.." and "While recovery is paused, ...")
>
> How about something like this?
>
> Request to pause recovery. Server actually stops recovery at a
> convenient time. This can take a few seconds after the request. If you
> need to strictly guarantee that no further database change will occur,
> you can check using pg_get_wal_replay_ause_state(). Note that
> pg_is_wal_replay_paused() may return true before recovery actually
> stops.

I still think that for the user-facing documentation purpose the
current paragraph looks better.

> The patch adds two loops whth the following logic:
>
>  while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)
>  {
>...
>ConfirmRecoveryPaused();
>
>  }
>
> After the renaming of the function, the following structure looks
> simpler and more natural.
>
>  while (ConfirmRecoveryPaused())
>  {
>...
>
>  }

So do you mean that if the pause is requested ConfirmRecoveryPaused
will set it to paused and if it is not paused then it will return
false?  With the current function name, I don't think that will look
clean maybe we should change the name to something like
CheckAndConfirmRecoveryPaused?  Or I am fine with the way it is now.
Any other thoughts?

>
> +   /* test for recovery pause, if user has requested the pause */
> +   if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState !=
>
> The reason for the checkpoint is to move to "paused" state in a
> reasonable time.  I think we need to mention that reason rather than
> what is done here.

I will do that.

>
> +   /* get the recovery pause state */
> +   switch(GetRecoveryPauseState())
> +   {
> +   case RECOVERY_NOT_PAUSED:
> +   state = "not paused";
> +   break;
> ...
> +   default:
> +   elog(ERROR, "invalid recovery pause state");
>
> This disables the static enum coverage check and it is not likely to
> have a wrong value here, other than the case of shared memory
> corruption.  So we can remove the default case
> here. pg_get_replication_slots() is going that direction and
> apply_dispatch() is taking a slightly different way.  Anyway I think
> that we can take away the default case.

So do you think we should put an assert(0) in the default case?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-23 Thread Kyotaro Horiguchi
At Tue, 23 Feb 2021 12:03:32 +0530, Dilip Kumar  wrote 
in 
> On Fri, Feb 12, 2021 at 3:26 AM Robert Haas  wrote:
> > There might be some more to say here, but those are things I notice on
> > a first read-through.
> 
> Okay.

It seems to me all the suggestions are addressed in this version.

+Request to pause recovery.  A request doesn't mean that recovery stops
+right away.  If you want a guarantee that recovery is actually paused,
+you need to check for the recovery pause state returned by
+pg_get_wal_replay_pause_state().  Note that
+pg_is_wal_replay_paused() returns whether a 
request
+is made.  While recovery is paused, no further database changes are 
applied.

This looks like explainig the same thing twice. ("A request doesn't
mean.." and "While recovery is paused, ...")

How about something like this?

Request to pause recovery. Server actually stops recovery at a
convenient time. This can take a few seconds after the request. If you
need to strictly guarantee that no further database change will occur,
you can check using pg_get_wal_replay_ause_state(). Note that
pg_is_wal_replay_paused() may return true before recovery actually
stops.


The patch adds two loops whth the following logic:

 while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)
 {
   ...
   ConfirmRecoveryPaused();
   
 }

After the renaming of the function, the following structure looks
simpler and more natural.

 while (ConfirmRecoveryPaused())
 {
   ...
   
 }


+   /* test for recovery pause, if user has requested the pause */
+   if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState !=

The reason for the checkpoint is to move to "paused" state in a
reasonable time.  I think we need to mention that reason rather than
what is done here.


+   /* get the recovery pause state */
+   switch(GetRecoveryPauseState())
+   {
+   case RECOVERY_NOT_PAUSED:
+   state = "not paused";
+   break;
...
+   default:
+   elog(ERROR, "invalid recovery pause state");

This disables the static enum coverage check and it is not likely to
have a wrong value here, other than the case of shared memory
corruption.  So we can remove the default case
here. pg_get_replication_slots() is going that direction and
apply_dispatch() is taking a slightly different way.  Anyway I think
that we can take away the default case.


regard.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Is Recovery actually paused?

2021-02-22 Thread Dilip Kumar
On Fri, Feb 12, 2021 at 3:26 AM Robert Haas  wrote:
>
> On Thu, Feb 11, 2021 at 6:07 AM Dilip Kumar  wrote:
> > > Thanks for the patch. I tested the new function and it works as
> > > expected. I have no further comments on the v13 patch.
> >
> > Thanks for the review and testing.
>
> I don't see a whole lot wrong with this patch, but I think there are
> some things that could make it a little clearer:

Thanks for the review

> - I suggest renaming CheckAndSetRecoveryPause() to ConfirmRecoveryPaused().

Yeah that make more sense so changed.

> - I suggest moving the definition of that function to just after
> SetRecoveryPause().

Done

> - I suggest changing the argument to SetRecoveryPause() back to bool.
> In the one place where you call SetRecoveryPause(RECOVERY_PAUSED),
> just call SetRecoveryPause(true) and ConfirmRecoveryPaused() back to
> back.

Yeah done that way,  I think only in once place we were doing
SetRecoveryPause(RECOVERY_PAUSED), but after putting more thought I
think that was not required because right after setting that we are
having the while loop under that we have to call
ConfirmRecoveryPaused.  So I have changed that also as
SetRecoveryPause(true) without immediate call of
ConfirmRecoveryPaused.

This in turn means that the "if" statement in
> SetRecoveryPaused() can be rewritten as if (!recoveryPaused)
> XLogCtl->recoveryPauseState = RECOVERY_NOT_PAUSED else if
> (XLogCtl->recoveryPauseState == RECOVERY_NOT_PAUSED)
> XLogCtl->recoveryPauseState = RECOVERY_PAUSE_REQUESTED(). This is
> slightly less efficient, but I don't think it matters, and I think it
> will be a lot more clear what's the job of SetRecoveryPause (say
> whether we're trying to pause or not) and what's the job of
> ConfirmRecoveryPaused (say whether we've succeeded in pausing).

Done

> - Since the numeric values of RecoveryPauseState don't matter and the
> values are never visible to anything outside the server nor stored on
> disk, I would be inclined to (a) not specify particular values in
> xlog.h and (b) remove the test-and-elog in SetRecoveryPause().

Done

> - In the places where you say:
>
> -if (((volatile XLogCtlData *) XLogCtl)->recoveryPause)
> +if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState ==
> +RECOVERY_PAUSE_REQUESTED)
>
> ...I would suggest instead testing for != RECOVERY_NOT_PAUSED. Perhaps
> we don't think RECOVERY_PAUSED can happen here. But if somehow it did,
> calling recoveryPausesHere() would be right.

Done

> There might be some more to say here, but those are things I notice on
> a first read-through.

Okay.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From cf45d795946c0f33f7f29b533f175139e2a92583 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Wed, 27 Jan 2021 16:46:04 +0530
Subject: [PATCH v14] Provide a new interface to get the recovery pause status

Currently, pg_is_wal_replay_paused, just checks whether the recovery
pause is requested or not but it doesn't actually tell whether the
recovery is actually paused or not.  So basically there is no way for
the user to know the actual status of the pause request.  This patch
provides a new interface pg_get_wal_replay_pause_state that will
return the actual status of the recovery pause i.e.'not paused' if
pause is not requested, 'pause requested' if pause is requested but
recovery is not yet paused and 'paused' if recovery is actually paused.
---
 doc/src/sgml/func.sgml | 32 +++---
 src/backend/access/transam/xlog.c  | 81 --
 src/backend/access/transam/xlogfuncs.c | 46 ++-
 src/include/access/xlog.h  | 10 -
 src/include/catalog/pg_proc.dat|  4 ++
 5 files changed, 151 insertions(+), 22 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index d822427..500bc9b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25285,7 +25285,24 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 boolean


-Returns true if recovery is paused.
+Returns true if recovery pause is requested.
+   
+  
+
+  
+   
+
+ pg_get_wal_replay_pause_state
+
+pg_get_wal_replay_pause_state ()
+text
+   
+   
+Returns recovery pause state.  The return values are 
+not paused if pause is not requested, 
+pause requested if pause is requested but recovery is
+not yet paused and, paused if the recovery is
+actually paused.

   
 
@@ -25324,10 +25341,15 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 void


-Pauses recovery.  While recovery is paused, no further database
-changes are applied.  If hot standby is active, all new queries will
-see the same consistent snapshot of the database, and no further query
-

Re: Is Recovery actually paused?

2021-02-11 Thread Kyotaro Horiguchi
At Fri, 12 Feb 2021 13:33:32 +0900, Yugo NAGATA  wrote in 
> I don't think that we need to include the waiting approach in 
> pg_get_wal_replay_pause_state
> patch. However, Horiguchi-san's patch may be useful for some users who want
> pg_wal_replay_pause to wait until recovery gets paused instead of polling the
> state from applications. So, I shink we could discuss  this patch  in another
> thread as another commitfest entry independent from 
> pg_get_wal_replay_pause_state.

Since what I'm proposing is not making pg_wal_replay_pause() to wait,
and no one seems on my side, I withdraw the proposal.

> I have no futher comments on the v13 patch, too.  Also, I agree with
> Robert Haas's suggestions.

Yeah, look reasonable.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Is Recovery actually paused?

2021-02-11 Thread Yugo NAGATA
On Thu, 11 Feb 2021 16:36:55 +0530
Dilip Kumar  wrote:

> On Thu, Feb 11, 2021 at 3:20 PM Bharath Rupireddy
>  wrote:
> >
> > On Wed, Feb 10, 2021 at 8:39 PM Dilip Kumar  wrote:
> > >
> > > On Wed, Feb 10, 2021 at 10:02 AM Dilip Kumar  
> > > wrote:
> > > >
> > > > I don't find any problem with this approach as well, but I personally
> > > > feel that the other approach where we don't wait in any API and just
> > > > return the recovery pause state is much simpler and more flexible.  So
> > > > I will make the pending changes in that patch and let's see what are
> > > > the other opinion and based on that we can conclude.  Thanks for the
> > > > patch.

I don't think that we need to include the waiting approach in 
pg_get_wal_replay_pause_state
patch. However, Horiguchi-san's patch may be useful for some users who want
pg_wal_replay_pause to wait until recovery gets paused instead of polling the
state from applications. So, I shink we could discuss  this patch  in another
thread as another commitfest entry independent from 
pg_get_wal_replay_pause_state.

> > > Here is an updated version of the patch which fixes the last two open 
> > > problems
> > > 1. In RecoveryRequiresIntParameter set the recovery pause state in the
> > > loop so that if recovery resumed and pause requested again we can set
> > > to pause again.
> > > 2. If the recovery state is already 'paused' then don't set it back to
> > > the 'pause requested'.
> > >
> > > One more point is that in 'pg_wal_replay_pause' even if we don't
> > > change the state because it was already set to the 'paused' then also
> > > we call the WakeupRecovery.  But I don't think there is any problem
> > > with that, if we think that this should be changed then we can make
> > > SetRecoveryPause return a bool such that if it doesn't do state change
> > > then it returns false and in that case we can avoid calling
> > > WakeupRecovery, but I felt that is unnecessary.  Any other thoughts on
> > > this?
> >
> > IMO, that WakeupRecovery should not be a problem, because even now, if
> > we issue a simple select pg_reload_conf(); (without even changing any
> > config parameter), WakeupRecovery gets called.
> >
> > Thanks for the patch. I tested the new function and it works as
> > expected. I have no further comments on the v13 patch.
> 
> Thanks for the review and testing.

I have no futher comments on the v13 patch, too.  Also, I agree with
Robert Haas's suggestions.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: Is Recovery actually paused?

2021-02-11 Thread Robert Haas
On Thu, Feb 11, 2021 at 6:07 AM Dilip Kumar  wrote:
> > Thanks for the patch. I tested the new function and it works as
> > expected. I have no further comments on the v13 patch.
>
> Thanks for the review and testing.

I don't see a whole lot wrong with this patch, but I think there are
some things that could make it a little clearer:

- I suggest renaming CheckAndSetRecoveryPause() to ConfirmRecoveryPaused().

- I suggest moving the definition of that function to just after
SetRecoveryPause().

- I suggest changing the argument to SetRecoveryPause() back to bool.
In the one place where you call SetRecoveryPause(RECOVERY_PAUSED),
just call SetRecoveryPause(true) and ConfirmRecoveryPaused() back to
back. This in turn means that the "if" statement in
SetRecoveryPaused() can be rewritten as if (!recoveryPaused)
XLogCtl->recoveryPauseState = RECOVERY_NOT_PAUSED else if
(XLogCtl->recoveryPauseState == RECOVERY_NOT_PAUSED)
XLogCtl->recoveryPauseState = RECOVERY_PAUSE_REQUESTED(). This is
slightly less efficient, but I don't think it matters, and I think it
will be a lot more clear what's the job of SetRecoveryPause (say
whether we're trying to pause or not) and what's the job of
ConfirmRecoveryPaused (say whether we've succeeded in pausing).

- Since the numeric values of RecoveryPauseState don't matter and the
values are never visible to anything outside the server nor stored on
disk, I would be inclined to (a) not specify particular values in
xlog.h and (b) remove the test-and-elog in SetRecoveryPause().

- In the places where you say:

-if (((volatile XLogCtlData *) XLogCtl)->recoveryPause)
+if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState ==
+RECOVERY_PAUSE_REQUESTED)

...I would suggest instead testing for != RECOVERY_NOT_PAUSED. Perhaps
we don't think RECOVERY_PAUSED can happen here. But if somehow it did,
calling recoveryPausesHere() would be right.

There might be some more to say here, but those are things I notice on
a first read-through.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-11 Thread Dilip Kumar
On Thu, Feb 11, 2021 at 3:20 PM Bharath Rupireddy
 wrote:
>
> On Wed, Feb 10, 2021 at 8:39 PM Dilip Kumar  wrote:
> >
> > On Wed, Feb 10, 2021 at 10:02 AM Dilip Kumar  wrote:
> > >
> > > I don't find any problem with this approach as well, but I personally
> > > feel that the other approach where we don't wait in any API and just
> > > return the recovery pause state is much simpler and more flexible.  So
> > > I will make the pending changes in that patch and let's see what are
> > > the other opinion and based on that we can conclude.  Thanks for the
> > > patch.
> >
> > Here is an updated version of the patch which fixes the last two open 
> > problems
> > 1. In RecoveryRequiresIntParameter set the recovery pause state in the
> > loop so that if recovery resumed and pause requested again we can set
> > to pause again.
> > 2. If the recovery state is already 'paused' then don't set it back to
> > the 'pause requested'.
> >
> > One more point is that in 'pg_wal_replay_pause' even if we don't
> > change the state because it was already set to the 'paused' then also
> > we call the WakeupRecovery.  But I don't think there is any problem
> > with that, if we think that this should be changed then we can make
> > SetRecoveryPause return a bool such that if it doesn't do state change
> > then it returns false and in that case we can avoid calling
> > WakeupRecovery, but I felt that is unnecessary.  Any other thoughts on
> > this?
>
> IMO, that WakeupRecovery should not be a problem, because even now, if
> we issue a simple select pg_reload_conf(); (without even changing any
> config parameter), WakeupRecovery gets called.
>
> Thanks for the patch. I tested the new function and it works as
> expected. I have no further comments on the v13 patch.

Thanks for the review and testing.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-11 Thread Bharath Rupireddy
On Wed, Feb 10, 2021 at 8:39 PM Dilip Kumar  wrote:
>
> On Wed, Feb 10, 2021 at 10:02 AM Dilip Kumar  wrote:
> >
> > I don't find any problem with this approach as well, but I personally
> > feel that the other approach where we don't wait in any API and just
> > return the recovery pause state is much simpler and more flexible.  So
> > I will make the pending changes in that patch and let's see what are
> > the other opinion and based on that we can conclude.  Thanks for the
> > patch.
>
> Here is an updated version of the patch which fixes the last two open problems
> 1. In RecoveryRequiresIntParameter set the recovery pause state in the
> loop so that if recovery resumed and pause requested again we can set
> to pause again.
> 2. If the recovery state is already 'paused' then don't set it back to
> the 'pause requested'.
>
> One more point is that in 'pg_wal_replay_pause' even if we don't
> change the state because it was already set to the 'paused' then also
> we call the WakeupRecovery.  But I don't think there is any problem
> with that, if we think that this should be changed then we can make
> SetRecoveryPause return a bool such that if it doesn't do state change
> then it returns false and in that case we can avoid calling
> WakeupRecovery, but I felt that is unnecessary.  Any other thoughts on
> this?

IMO, that WakeupRecovery should not be a problem, because even now, if
we issue a simple select pg_reload_conf(); (without even changing any
config parameter), WakeupRecovery gets called.

Thanks for the patch. I tested the new function and it works as
expected. I have no further comments on the v13 patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-10 Thread Dilip Kumar
On Wed, Feb 10, 2021 at 10:02 AM Dilip Kumar  wrote:
>
> I don't find any problem with this approach as well, but I personally
> feel that the other approach where we don't wait in any API and just
> return the recovery pause state is much simpler and more flexible.  So
> I will make the pending changes in that patch and let's see what are
> the other opinion and based on that we can conclude.  Thanks for the
> patch.

Here is an updated version of the patch which fixes the last two open problems
1. In RecoveryRequiresIntParameter set the recovery pause state in the
loop so that if recovery resumed and pause requested again we can set
to pause again.
2. If the recovery state is already 'paused' then don't set it back to
the 'pause requested'.

One more point is that in 'pg_wal_replay_pause' even if we don't
change the state because it was already set to the 'paused' then also
we call the WakeupRecovery.  But I don't think there is any problem
with that, if we think that this should be changed then we can make
SetRecoveryPause return a bool such that if it doesn't do state change
then it returns false and in that case we can avoid calling
WakeupRecovery, but I felt that is unnecessary.  Any other thoughts on
this?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From 4f7f3e11c3c6354225ddb80bb06b503e940cd8ff Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Wed, 27 Jan 2021 16:46:04 +0530
Subject: [PATCH v13] Provide a new interface to get the recovery pause status

Currently, pg_is_wal_replay_paused, just checks whether the recovery
pause is requested or not but it doesn't actually tell whether the
recovery is actually paused or not.  So basically there is no way for
the user to know the actual status of the pause request.  This patch
provides a new interface pg_get_wal_replay_pause_state that will
return the actual status of the recovery pause i.e.'not paused' if
pause is not requested, 'pause requested' if pause is requested but
recovery is not yet paused and 'paused' if recovery is actually paused.
---
 doc/src/sgml/func.sgml | 32 +++--
 src/backend/access/transam/xlog.c  | 86 +++---
 src/backend/access/transam/xlogfuncs.c | 50 ++--
 src/include/access/xlog.h  | 12 -
 src/include/catalog/pg_proc.dat|  4 ++
 5 files changed, 156 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1ab31a9..9b2c429 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25285,7 +25285,24 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 boolean


-Returns true if recovery is paused.
+Returns true if recovery pause is requested.
+   
+  
+
+  
+   
+
+ pg_get_wal_replay_pause_state
+
+pg_get_wal_replay_pause_state ()
+text
+   
+   
+Returns recovery pause state.  The return values are 
+not paused if pause is not requested, 
+pause requested if pause is requested but recovery is
+not yet paused and, paused if the recovery is
+actually paused.

   
 
@@ -25324,10 +25341,15 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 void


-Pauses recovery.  While recovery is paused, no further database
-changes are applied.  If hot standby is active, all new queries will
-see the same consistent snapshot of the database, and no further query
-conflicts will be generated until recovery is resumed.
+Request to pause recovery.  A request doesn't mean that recovery stops
+right away.  If you want a guarantee that recovery is actually paused,
+you need to check for the recovery pause state returned by
+pg_get_wal_replay_pause_state().  Note that
+pg_is_wal_replay_paused() returns whether a request
+is made.  While recovery is paused, no further database changes are applied.
+If hot standby is active, all new queries will see the same consistent
+snapshot of the database, and no further query conflicts will be generated
+until recovery is resumed.


 This function is restricted to superusers by default, but other users
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8e3b5df..459a19b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -721,8 +721,8 @@ typedef struct XLogCtlData
 	 * only relevant for replication or archive recovery
 	 */
 	TimestampTz currentChunkStartTime;
-	/* Are we requested to pause recovery? */
-	bool		recoveryPause;
+	/* Recovery pause state */
+	RecoveryPauseState	recoveryPauseState;
 
 	/*
 	 * lastFpwDisableRecPtr points to the start of the last replayed
@@ -894,6 +894,7 @@ static void validateRecoveryParameters(void);
 

Re: Is Recovery actually paused?

2021-02-09 Thread Dilip Kumar
On Wed, Feb 10, 2021 at 8:19 AM Kyotaro Horiguchi
 wrote:
>
> At Tue, 9 Feb 2021 12:27:21 +0530, Bharath Rupireddy 
>  wrote in
> > What I meant was that if we were to add waiting logic inside
> > pg_wal_replay_pause, we should also have a timeout with some default
> > value, to avoid pg_wal_replay_pause waiting forever in the waiting
> > loop. Within that timeout, if the recovery isn't paused,
> > pg_wal_replay_pause will return probably a warning and a false(this
> > requires us to change the return value of the existing
> > pg_wal_replay_pause)?
>
> I thought that rm_redo finishes shortly unless any trouble
> happens. But on second thought, I found that I forgot a case of a
> recovery-conflict. So as you pointed out, pg_wal_replay_pause() needs
> a flag 'wait' to wait for a pause established. And the flag can be
> turned into "timeout".
>
> # And the prevous verision had another silly bug.
>
> > To avoid changing the existing API and return type, a new function
> > pg_get_wal_replay_pause_state is introduced.
>
> I mentioned about IN parameters, not OUTs. IN parameters can be
> optional to accept existing usage. pg_wal_replay_pause() is changed
> that way in the attached.
>
> If all of you still disagree with my proposal, I withdraw it.

I don't find any problem with this approach as well, but I personally
feel that the other approach where we don't wait in any API and just
return the recovery pause state is much simpler and more flexible.  So
I will make the pending changes in that patch and let's see what are
the other opinion and based on that we can conclude.  Thanks for the
patch.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-09 Thread Kyotaro Horiguchi
At Tue, 9 Feb 2021 12:27:21 +0530, Bharath Rupireddy 
 wrote in 
> What I meant was that if we were to add waiting logic inside
> pg_wal_replay_pause, we should also have a timeout with some default
> value, to avoid pg_wal_replay_pause waiting forever in the waiting
> loop. Within that timeout, if the recovery isn't paused,
> pg_wal_replay_pause will return probably a warning and a false(this
> requires us to change the return value of the existing
> pg_wal_replay_pause)?

I thought that rm_redo finishes shortly unless any trouble
happens. But on second thought, I found that I forgot a case of a
recovery-conflict. So as you pointed out, pg_wal_replay_pause() needs
a flag 'wait' to wait for a pause established. And the flag can be
turned into "timeout".

# And the prevous verision had another silly bug.

> To avoid changing the existing API and return type, a new function
> pg_get_wal_replay_pause_state is introduced.

I mentioned about IN parameters, not OUTs. IN parameters can be
optional to accept existing usage. pg_wal_replay_pause() is changed
that way in the attached.

If all of you still disagree with my proposal, I withdraw it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1ab31a9056..7eb93f74dd 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25320,14 +25320,19 @@ postgres=# SELECT * FROM 
pg_walfile_name_offset(pg_stop_backup());
 
  pg_wal_replay_pause
 
-pg_wal_replay_pause ()
+pg_wal_replay_pause (
+ timeout integer
+ )
 void


 Pauses recovery.  While recovery is paused, no further database
 changes are applied.  If hot standby is active, all new queries will
 see the same consistent snapshot of the database, and no further query
-conflicts will be generated until recovery is resumed.
+conflicts will be generated until recovery is resumed.  Zero or
+positive timeout value means the function errors out after that
+milliseconds elapsed before recovery is paused (default is -1, wait
+forever).


 This function is restricted to superusers by default, but other users
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 8e3b5df7dc..8fd614cded 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6072,10 +6072,61 @@ RecoveryIsPaused(void)
return recoveryPause;
 }
 
+/*
+ * Pauses recovery.
+ *
+ * It is guaranteed that no WAL replay happens after this function returns. If
+ * timeout is zero or positive, emits ERROR when the timeout is reached before
+ * recovery is paused.
+ */
 void
-SetRecoveryPause(bool recoveryPause)
+SetRecoveryPause(bool recoveryPause, int timeout)
 {
+   TimestampTz finish_time = 0;
+   TimestampTz now;
+   int sleep_ms;
+
SpinLockAcquire(>info_lck);
+
+   /* No need of timeout in the startup process */
+   Assert(!InRecovery || timeout < 0);
+
+   /*
+* Wait for the concurrent rm_redo() to finish, so that no records will 
be
+* applied after this function returns. No need to wait while resuming.
+* Anyway we are requesting a recovery pause, we don't mind a possible 
slow
+* down of recovery by the info_lck here.  We don't need to wait in the
+* startup process since no concurrent rm_redo() runs.
+*/
+   while(!InRecovery &&
+ recoveryPause && !XLogCtl->recoveryPause &&
+ XLogCtl->replayEndRecPtr != XLogCtl->lastReplayedEndRecPtr)
+   {
+   SpinLockRelease(>info_lck);
+   now = GetCurrentTimestamp();
+
+   if (timeout >= 0)
+   {
+   if (timeout > 0 && finish_time == 0)
+   finish_time = TimestampTzPlusMilliseconds(now, 
timeout);
+
+   if (finish_time < now)
+   ereport(ERROR,
+   
(errcode(ERRCODE_SQL_STATEMENT_NOT_YET_COMPLETE),
+errmsg ("could not pause 
recovery: timed out")));
+   }
+
+   CHECK_FOR_INTERRUPTS();
+
+   sleep_ms = 1L;  /* 10 ms */
+
+   /* finish_time may be reached earlier than 10ms */
+   if (finish_time > 0)
+   Min(sleep_ms, TimestampDifferenceMilliseconds(now, 
finish_time));
+
+   pg_usleep(sleep_ms);
+   SpinLockAcquire(>info_lck);
+   }
XLogCtl->recoveryPause = recoveryPause;
SpinLockRelease(>info_lck);
 }
@@ -6270,7 +6321,7 @@ RecoveryRequiresIntParameter(const char *param_name, int 
currValue, int minValue
   currValue,
  

Re: Is Recovery actually paused?

2021-02-08 Thread Bharath Rupireddy
On Tue, Feb 9, 2021 at 11:30 AM Kyotaro Horiguchi
 wrote:
>
> At Tue, 9 Feb 2021 09:58:30 +0530, Bharath Rupireddy 
>  wrote in
> > On Tue, Feb 9, 2021 at 9:48 AM Dilip Kumar  wrote:
> > >
> > > On Tue, Feb 9, 2021 at 8:54 AM Yugo NAGATA  wrote:
> > > >
> > > > On Tue, 09 Feb 2021 10:58:04 +0900 (JST)
> > > > Kyotaro Horiguchi  wrote:
> > > > > If we are going to introduce that complexity, I'd like to re-propose
> > > > > to introduce interlocking between the recovery side and the
> > > > > pause-requestor side instead of introducing the intermediate state,
> > > > > which is the cause of the complexity.
> > > > >
> > > > > The attached PoC patch adds:
> > > > >
> > > > > - A solid checkpoint just before calling rm_redo. It doesn't add a
> > > > >   info_lck since the check is done in the existing lock section.
> > > > >
> > > > > - Interlocking between the above and SetRecoveryPause without adding a
> > > > >   shared variable.
> > > > >   (This is what I called "synchronous" before.)
> > > >
> > > > I think waiting in pg_wal_replay_pasue is a possible option, but this 
> > > > will
> > > > also introduce other complexity to codes such as possibility of waiting 
> > > > for
> > > > long or for ever.  For example, waiting in SetRecoveryPause as in your 
> > > > POC
> > > > patch appears to make recovery stuck in  RecoveryRequiresIntParameter.
> > > >
> > >
> > > I agree with this,  I think we previously discussed these approaches
> > > where we can wait in pg_wal_replay_pasue() or
> > > pg_is_wal_replay_pasued().  In fact, we had an older version where we
> > > put the wait in pg_is_wal_replay_pasued().  But it appeared that doing
> > > so will add extra complexity as well as instead of waiting in these
> > > APIs the wait logic can be implemented in the application code which
> > > is actually using these APIs and IMHO that will give better control to
> > > the users.
> >
> > And also, having waiting logic in pg_wal_replay_pasue() or
> > pg_is_wal_replay_pasued() required changes to the existing API such as
> > a timeout to not allow them infinitely waiting.
>
> I don't understand that. pg_wal_replay_pause() is defined as "pausees
> recovery". so it is the correct behavior to wait actual pause.
> pg_is_wal_replay_paused() doesn't wait for anything at all.

What I meant was that if we were to add waiting logic inside
pg_wal_replay_pause, we should also have a timeout with some default
value, to avoid pg_wal_replay_pause waiting forever in the waiting
loop. Within that timeout, if the recovery isn't paused,
pg_wal_replay_pause will return probably a warning and a false(this
requires us to change the return value of the existing
pg_wal_replay_pause)?

To avoid changing the existing API and return type, a new function
pg_get_wal_replay_pause_state is introduced.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-08 Thread Kyotaro Horiguchi
Sorry, I made a mistake here.

At Tue, 09 Feb 2021 14:55:23 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Tue, 9 Feb 2021 09:47:58 +0530, Dilip Kumar  wrote 
> in 

> > APIs the wait logic can be implemented in the application code which
> > is actually using these APIs and IMHO that will give better control to
> > the users.
> 
> Year, with the PoC pg_wal_replay_pause() can make a short wait as a
> side-effect but the tri-state patch also can add a function to wait
> for the state suffices.

I said that it is surprising that pg_is_wal_replay_paused() waits for
the state change. But I didn't say that pg_wal_replay_pause()
shouldn't wait for the actual pause.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Is Recovery actually paused?

2021-02-08 Thread Kyotaro Horiguchi
At Tue, 9 Feb 2021 09:58:30 +0530, Bharath Rupireddy 
 wrote in 
> On Tue, Feb 9, 2021 at 9:48 AM Dilip Kumar  wrote:
> >
> > On Tue, Feb 9, 2021 at 8:54 AM Yugo NAGATA  wrote:
> > >
> > > On Tue, 09 Feb 2021 10:58:04 +0900 (JST)
> > > Kyotaro Horiguchi  wrote:
> > > > If we are going to introduce that complexity, I'd like to re-propose
> > > > to introduce interlocking between the recovery side and the
> > > > pause-requestor side instead of introducing the intermediate state,
> > > > which is the cause of the complexity.
> > > >
> > > > The attached PoC patch adds:
> > > >
> > > > - A solid checkpoint just before calling rm_redo. It doesn't add a
> > > >   info_lck since the check is done in the existing lock section.
> > > >
> > > > - Interlocking between the above and SetRecoveryPause without adding a
> > > >   shared variable.
> > > >   (This is what I called "synchronous" before.)
> > >
> > > I think waiting in pg_wal_replay_pasue is a possible option, but this will
> > > also introduce other complexity to codes such as possibility of waiting 
> > > for
> > > long or for ever.  For example, waiting in SetRecoveryPause as in your POC
> > > patch appears to make recovery stuck in  RecoveryRequiresIntParameter.
> > >
> >
> > I agree with this,  I think we previously discussed these approaches
> > where we can wait in pg_wal_replay_pasue() or
> > pg_is_wal_replay_pasued().  In fact, we had an older version where we
> > put the wait in pg_is_wal_replay_pasued().  But it appeared that doing
> > so will add extra complexity as well as instead of waiting in these
> > APIs the wait logic can be implemented in the application code which
> > is actually using these APIs and IMHO that will give better control to
> > the users.
> 
> And also, having waiting logic in pg_wal_replay_pasue() or
> pg_is_wal_replay_pasued() required changes to the existing API such as
> a timeout to not allow them infinitely waiting.

I don't understand that. pg_wal_replay_pause() is defined as "pausees
recovery". so it is the correct behavior to wait actual pause.
pg_is_wal_replay_paused() doesn't wait for anything at all.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Is Recovery actually paused?

2021-02-08 Thread Kyotaro Horiguchi
At Tue, 9 Feb 2021 09:47:58 +0530, Dilip Kumar  wrote in 
> On Tue, Feb 9, 2021 at 8:54 AM Yugo NAGATA  wrote:
> >
> > On Tue, 09 Feb 2021 10:58:04 +0900 (JST)
> > Kyotaro Horiguchi  wrote:
> >
> > > At Mon, 8 Feb 2021 17:05:52 +0530, Dilip Kumar  
> > > wrote in
> > > > On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA  wrote:
> > > > >
> > > > > On Mon, 08 Feb 2021 17:32:46 +0900 (JST)
> > > > > Kyotaro Horiguchi  wrote:
> > > > >
> > > > > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA 
> > > > > >  wrote in
> > > > > > > > > > I think the right fix should be that the state should never 
> > > > > > > > > > go from
> > > > > > > > > > ‘paused’ to ‘pause requested’  so I think 
> > > > > > > > > > pg_wal_replay_pause should take
> > > > > > > > > > care of that.
> > > > > > > > >
> > > > > > > > > It makes sense to take care of this in pg_wal_replay_pause, 
> > > > > > > > > but I wonder
> > > > > > > > > it can not handle the case that a user resume and pause again 
> > > > > > > > > while a sleep.
> > > > > > > >
> > > > > > > > Right, we will have to check and set in the loop.  But we 
> > > > > > > > should not
> > > > > > > > allow the state to go from paused to pause requested 
> > > > > > > > irrespective of
> > > > > > > > this.
> > > > > > >
> > > > > > > I agree with you.
> > > > > >
> > > > > > Is there any actual harm if PAUSED returns to REQUESETED, assuming 
> > > > > > we
> > > > > > immediately change the state to PAUSE always we see REQUESTED in the
> > > > > > waiting loop, despite that we allow change the state from PAUSE to
> > > > > > REQUESTED via NOT_PAUSED between two successive loop condition 
> > > > > > checks?
> > > > >
> > > > > If a user call pg_wal_replay_pause while recovery is paused, users can
> > > > > observe 'pause requested' during a sleep alghough the time window is 
> > > > > short.
> > > > > It seems a bit odd that pg_wal_replay_pause changes the state like 
> > > > > this
> > > > > because This state meeans that recovery may not be 'paused'.
> > > >
> > > > Yeah, this appears wrong that after 'paused' we go back to 'pause
> > > > requested'.  the logical state transition should always be as below
> > > >
> > > > NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to
> > > > request and then paused but there is nothing wrong with going to
> > > > paused)
> > > > PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or 
> > > > get paused)
> > > > PAUSED -> NOT PAUSED (from PAUSED we should not go to the
> > > > PAUSE_REQUESTED without going to NOT PAUSED)
> > >
> > > I didn't asked about the internal logical correctness, but asked about
> > > *actual harm* revealed to users. I don't see any actual harm in the
> > > "wrong" transition because:
> >
> > Actually, the incorrect state transition is not so harmful except that
> > users can observe unnecessary state changes. However, I don't think any
> > actual harm in prohibit the incorrect state transition. So, I think we
> > can do it.
> >
> > > If we are going to introduce that complexity, I'd like to re-propose
> > > to introduce interlocking between the recovery side and the
> > > pause-requestor side instead of introducing the intermediate state,
> > > which is the cause of the complexity.
> > >
> > > The attached PoC patch adds:
> > >
> > > - A solid checkpoint just before calling rm_redo. It doesn't add a
> > >   info_lck since the check is done in the existing lock section.
> > >
> > > - Interlocking between the above and SetRecoveryPause without adding a
> > >   shared variable.
> > >   (This is what I called "synchronous" before.)
> >
> > I think waiting in pg_wal_replay_pasue is a possible option, but this will
> > also introduce other complexity to codes such as possibility of waiting for
> > long or for ever.  For example, waiting in SetRecoveryPause as in your POC
> > patch appears to make recovery stuck in  RecoveryRequiresIntParameter.

Ah. Yes, startup process does not need to wait. That is a bug of the
patch. No other callers don't cause the self dead lock.

> I agree with this,  I think we previously discussed these approaches
> where we can wait in pg_wal_replay_pasue() or
> pg_is_wal_replay_pasued().  In fact, we had an older version where we
> put the wait in pg_is_wal_replay_pasued().  But it appeared that doing

Note that the expected waiting period is while calling rmgr_redo(). If
it is stuck for a long time, that suggests something's going wrong.

> so will add extra complexity as well as instead of waiting in these
> APIs the wait logic can be implemented in the application code which
> is actually using these APIs and IMHO that will give better control to
> the users.

Year, with the PoC pg_wal_replay_pause() can make a short wait as a
side-effect but the tri-state patch also can add a function to wait
for the state suffices.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c 

Re: Is Recovery actually paused?

2021-02-08 Thread Kyotaro Horiguchi
At Tue, 9 Feb 2021 12:23:23 +0900, Yugo NAGATA  wrote in 
> On Tue, 09 Feb 2021 10:58:04 +0900 (JST)
> Kyotaro Horiguchi  wrote:
> > I didn't asked about the internal logical correctness, but asked about
> > *actual harm* revealed to users. I don't see any actual harm in the
> > "wrong" transition because:
> 
> Actually, the incorrect state transition is not so harmful except that
> users can observe unnecessary state changes. However, I don't think any
> actual harm in prohibit the incorrect state transition. So, I think we
> can do it.

I don't say that we cannot do that. Just it is needeless.

> > If we are going to introduce that complexity, I'd like to re-propose
> > to introduce interlocking between the recovery side and the
> > pause-requestor side instead of introducing the intermediate state,
> > which is the cause of the complexity.
> > 
> > The attached PoC patch adds:
> > 
> > - A solid checkpoint just before calling rm_redo. It doesn't add a
> >   info_lck since the check is done in the existing lock section.
> > 
> > - Interlocking between the above and SetRecoveryPause without adding a
> >   shared variable.
> >   (This is what I called "synchronous" before.)
> 
> I think waiting in pg_wal_replay_pasue is a possible option, but this will
> also introduce other complexity to codes such as possibility of waiting for
> long or for ever.  For example, waiting in SetRecoveryPause as in your POC
> patch appears to make recovery stuck in  RecoveryRequiresIntParameter.

That is easily avoidable CFI in the loop.

> By the way, attaching other patch to a thread without the original patch
> will make commitfest and cfbot APP confused...

Oops! Sorry for that. I forgot to append .txt or such to the file name.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Is Recovery actually paused?

2021-02-08 Thread Bharath Rupireddy
On Tue, Feb 9, 2021 at 9:48 AM Dilip Kumar  wrote:
>
> On Tue, Feb 9, 2021 at 8:54 AM Yugo NAGATA  wrote:
> >
> > On Tue, 09 Feb 2021 10:58:04 +0900 (JST)
> > Kyotaro Horiguchi  wrote:
> >
> > > At Mon, 8 Feb 2021 17:05:52 +0530, Dilip Kumar  
> > > wrote in
> > > > On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA  wrote:
> > > > >
> > > > > On Mon, 08 Feb 2021 17:32:46 +0900 (JST)
> > > > > Kyotaro Horiguchi  wrote:
> > > > >
> > > > > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA 
> > > > > >  wrote in
> > > > > > > > > > I think the right fix should be that the state should never 
> > > > > > > > > > go from
> > > > > > > > > > ‘paused’ to ‘pause requested’  so I think 
> > > > > > > > > > pg_wal_replay_pause should take
> > > > > > > > > > care of that.
> > > > > > > > >
> > > > > > > > > It makes sense to take care of this in pg_wal_replay_pause, 
> > > > > > > > > but I wonder
> > > > > > > > > it can not handle the case that a user resume and pause again 
> > > > > > > > > while a sleep.
> > > > > > > >
> > > > > > > > Right, we will have to check and set in the loop.  But we 
> > > > > > > > should not
> > > > > > > > allow the state to go from paused to pause requested 
> > > > > > > > irrespective of
> > > > > > > > this.
> > > > > > >
> > > > > > > I agree with you.
> > > > > >
> > > > > > Is there any actual harm if PAUSED returns to REQUESETED, assuming 
> > > > > > we
> > > > > > immediately change the state to PAUSE always we see REQUESTED in the
> > > > > > waiting loop, despite that we allow change the state from PAUSE to
> > > > > > REQUESTED via NOT_PAUSED between two successive loop condition 
> > > > > > checks?
> > > > >
> > > > > If a user call pg_wal_replay_pause while recovery is paused, users can
> > > > > observe 'pause requested' during a sleep alghough the time window is 
> > > > > short.
> > > > > It seems a bit odd that pg_wal_replay_pause changes the state like 
> > > > > this
> > > > > because This state meeans that recovery may not be 'paused'.
> > > >
> > > > Yeah, this appears wrong that after 'paused' we go back to 'pause
> > > > requested'.  the logical state transition should always be as below
> > > >
> > > > NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to
> > > > request and then paused but there is nothing wrong with going to
> > > > paused)
> > > > PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or 
> > > > get paused)
> > > > PAUSED -> NOT PAUSED (from PAUSED we should not go to the
> > > > PAUSE_REQUESTED without going to NOT PAUSED)
> > >
> > > I didn't asked about the internal logical correctness, but asked about
> > > *actual harm* revealed to users. I don't see any actual harm in the
> > > "wrong" transition because:
> >
> > Actually, the incorrect state transition is not so harmful except that
> > users can observe unnecessary state changes. However, I don't think any
> > actual harm in prohibit the incorrect state transition. So, I think we
> > can do it.
> >
> > > If we are going to introduce that complexity, I'd like to re-propose
> > > to introduce interlocking between the recovery side and the
> > > pause-requestor side instead of introducing the intermediate state,
> > > which is the cause of the complexity.
> > >
> > > The attached PoC patch adds:
> > >
> > > - A solid checkpoint just before calling rm_redo. It doesn't add a
> > >   info_lck since the check is done in the existing lock section.
> > >
> > > - Interlocking between the above and SetRecoveryPause without adding a
> > >   shared variable.
> > >   (This is what I called "synchronous" before.)
> >
> > I think waiting in pg_wal_replay_pasue is a possible option, but this will
> > also introduce other complexity to codes such as possibility of waiting for
> > long or for ever.  For example, waiting in SetRecoveryPause as in your POC
> > patch appears to make recovery stuck in  RecoveryRequiresIntParameter.
> >
>
> I agree with this,  I think we previously discussed these approaches
> where we can wait in pg_wal_replay_pasue() or
> pg_is_wal_replay_pasued().  In fact, we had an older version where we
> put the wait in pg_is_wal_replay_pasued().  But it appeared that doing
> so will add extra complexity as well as instead of waiting in these
> APIs the wait logic can be implemented in the application code which
> is actually using these APIs and IMHO that will give better control to
> the users.

And also, having waiting logic in pg_wal_replay_pasue() or
pg_is_wal_replay_pasued() required changes to the existing API such as
a timeout to not allow them infinitely waiting.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-08 Thread Dilip Kumar
On Tue, Feb 9, 2021 at 8:54 AM Yugo NAGATA  wrote:
>
> On Tue, 09 Feb 2021 10:58:04 +0900 (JST)
> Kyotaro Horiguchi  wrote:
>
> > At Mon, 8 Feb 2021 17:05:52 +0530, Dilip Kumar  
> > wrote in
> > > On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA  wrote:
> > > >
> > > > On Mon, 08 Feb 2021 17:32:46 +0900 (JST)
> > > > Kyotaro Horiguchi  wrote:
> > > >
> > > > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA  
> > > > > wrote in
> > > > > > > > > I think the right fix should be that the state should never 
> > > > > > > > > go from
> > > > > > > > > ‘paused’ to ‘pause requested’  so I think pg_wal_replay_pause 
> > > > > > > > > should take
> > > > > > > > > care of that.
> > > > > > > >
> > > > > > > > It makes sense to take care of this in pg_wal_replay_pause, but 
> > > > > > > > I wonder
> > > > > > > > it can not handle the case that a user resume and pause again 
> > > > > > > > while a sleep.
> > > > > > >
> > > > > > > Right, we will have to check and set in the loop.  But we should 
> > > > > > > not
> > > > > > > allow the state to go from paused to pause requested irrespective 
> > > > > > > of
> > > > > > > this.
> > > > > >
> > > > > > I agree with you.
> > > > >
> > > > > Is there any actual harm if PAUSED returns to REQUESETED, assuming we
> > > > > immediately change the state to PAUSE always we see REQUESTED in the
> > > > > waiting loop, despite that we allow change the state from PAUSE to
> > > > > REQUESTED via NOT_PAUSED between two successive loop condition checks?
> > > >
> > > > If a user call pg_wal_replay_pause while recovery is paused, users can
> > > > observe 'pause requested' during a sleep alghough the time window is 
> > > > short.
> > > > It seems a bit odd that pg_wal_replay_pause changes the state like this
> > > > because This state meeans that recovery may not be 'paused'.
> > >
> > > Yeah, this appears wrong that after 'paused' we go back to 'pause
> > > requested'.  the logical state transition should always be as below
> > >
> > > NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to
> > > request and then paused but there is nothing wrong with going to
> > > paused)
> > > PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or get 
> > > paused)
> > > PAUSED -> NOT PAUSED (from PAUSED we should not go to the
> > > PAUSE_REQUESTED without going to NOT PAUSED)
> >
> > I didn't asked about the internal logical correctness, but asked about
> > *actual harm* revealed to users. I don't see any actual harm in the
> > "wrong" transition because:
>
> Actually, the incorrect state transition is not so harmful except that
> users can observe unnecessary state changes. However, I don't think any
> actual harm in prohibit the incorrect state transition. So, I think we
> can do it.
>
> > If we are going to introduce that complexity, I'd like to re-propose
> > to introduce interlocking between the recovery side and the
> > pause-requestor side instead of introducing the intermediate state,
> > which is the cause of the complexity.
> >
> > The attached PoC patch adds:
> >
> > - A solid checkpoint just before calling rm_redo. It doesn't add a
> >   info_lck since the check is done in the existing lock section.
> >
> > - Interlocking between the above and SetRecoveryPause without adding a
> >   shared variable.
> >   (This is what I called "synchronous" before.)
>
> I think waiting in pg_wal_replay_pasue is a possible option, but this will
> also introduce other complexity to codes such as possibility of waiting for
> long or for ever.  For example, waiting in SetRecoveryPause as in your POC
> patch appears to make recovery stuck in  RecoveryRequiresIntParameter.
>

I agree with this,  I think we previously discussed these approaches
where we can wait in pg_wal_replay_pasue() or
pg_is_wal_replay_pasued().  In fact, we had an older version where we
put the wait in pg_is_wal_replay_pasued().  But it appeared that doing
so will add extra complexity as well as instead of waiting in these
APIs the wait logic can be implemented in the application code which
is actually using these APIs and IMHO that will give better control to
the users.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-08 Thread Dilip Kumar
On Tue, Feb 9, 2021 at 7:28 AM Kyotaro Horiguchi
 wrote:
>
> At Mon, 8 Feb 2021 17:05:52 +0530, Dilip Kumar  wrote 
> in
> > On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA  wrote:
> > >
> > > On Mon, 08 Feb 2021 17:32:46 +0900 (JST)
> > > Kyotaro Horiguchi  wrote:
> > >
> > > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA  
> > > > wrote in
> > > > > > > > I think the right fix should be that the state should never go 
> > > > > > > > from
> > > > > > > > ‘paused’ to ‘pause requested’  so I think pg_wal_replay_pause 
> > > > > > > > should take
> > > > > > > > care of that.
> > > > > > >
> > > > > > > It makes sense to take care of this in pg_wal_replay_pause, but I 
> > > > > > > wonder
> > > > > > > it can not handle the case that a user resume and pause again 
> > > > > > > while a sleep.
> > > > > >
> > > > > > Right, we will have to check and set in the loop.  But we should not
> > > > > > allow the state to go from paused to pause requested irrespective of
> > > > > > this.
> > > > >
> > > > > I agree with you.
> > > >
> > > > Is there any actual harm if PAUSED returns to REQUESETED, assuming we
> > > > immediately change the state to PAUSE always we see REQUESTED in the
> > > > waiting loop, despite that we allow change the state from PAUSE to
> > > > REQUESTED via NOT_PAUSED between two successive loop condition checks?
> > >
> > > If a user call pg_wal_replay_pause while recovery is paused, users can
> > > observe 'pause requested' during a sleep alghough the time window is 
> > > short.
> > > It seems a bit odd that pg_wal_replay_pause changes the state like this
> > > because This state meeans that recovery may not be 'paused'.
> >
> > Yeah, this appears wrong that after 'paused' we go back to 'pause
> > requested'.  the logical state transition should always be as below
> >
> > NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to
> > request and then paused but there is nothing wrong with going to
> > paused)
> > PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or get 
> > paused)
> > PAUSED -> NOT PAUSED (from PAUSED we should not go to the
> > PAUSE_REQUESTED without going to NOT PAUSED)
>
> I didn't asked about the internal logical correctness, but asked about
> *actual harm* revealed to users. I don't see any actual harm in the
> "wrong" transition because:
>
> 1. It is not wrong nor strange that the invoker of pg_wal_replay_pause
>   sees the state PAUSE_REQUESTED before it changes to PAUSED.  Even if
>   the previous state was PAUSED, it is no business of the requestors.

The 'pg_wal_replay_pause' request to pause the recovery so it is fine
to first change the state to PAUSE_REQUESTED and then to PAUSED.  But
if the recovery is already paused then what is the point in bringing
the state back to PAUSE_REQUESTED.  For example, suppose the tool want
to raise the pause request and wait until recovery actually paused, so
if it was already paused then if we bring it back to PAUSE_REQUESTED
then it doesn't look correct and we might need to do an extra wait
cycle in the tool until it reaches back to PAUSED, I don't think that
is really a best design.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-08 Thread Yugo NAGATA
On Tue, 09 Feb 2021 10:58:04 +0900 (JST)
Kyotaro Horiguchi  wrote:

> At Mon, 8 Feb 2021 17:05:52 +0530, Dilip Kumar  wrote 
> in 
> > On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA  wrote:
> > >
> > > On Mon, 08 Feb 2021 17:32:46 +0900 (JST)
> > > Kyotaro Horiguchi  wrote:
> > >
> > > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA  
> > > > wrote in
> > > > > > > > I think the right fix should be that the state should never go 
> > > > > > > > from
> > > > > > > > ‘paused’ to ‘pause requested’  so I think pg_wal_replay_pause 
> > > > > > > > should take
> > > > > > > > care of that.
> > > > > > >
> > > > > > > It makes sense to take care of this in pg_wal_replay_pause, but I 
> > > > > > > wonder
> > > > > > > it can not handle the case that a user resume and pause again 
> > > > > > > while a sleep.
> > > > > >
> > > > > > Right, we will have to check and set in the loop.  But we should not
> > > > > > allow the state to go from paused to pause requested irrespective of
> > > > > > this.
> > > > >
> > > > > I agree with you.
> > > >
> > > > Is there any actual harm if PAUSED returns to REQUESETED, assuming we
> > > > immediately change the state to PAUSE always we see REQUESTED in the
> > > > waiting loop, despite that we allow change the state from PAUSE to
> > > > REQUESTED via NOT_PAUSED between two successive loop condition checks?
> > >
> > > If a user call pg_wal_replay_pause while recovery is paused, users can
> > > observe 'pause requested' during a sleep alghough the time window is 
> > > short.
> > > It seems a bit odd that pg_wal_replay_pause changes the state like this
> > > because This state meeans that recovery may not be 'paused'.
> > 
> > Yeah, this appears wrong that after 'paused' we go back to 'pause
> > requested'.  the logical state transition should always be as below
> > 
> > NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to
> > request and then paused but there is nothing wrong with going to
> > paused)
> > PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or get 
> > paused)
> > PAUSED -> NOT PAUSED (from PAUSED we should not go to the
> > PAUSE_REQUESTED without going to NOT PAUSED)
> 
> I didn't asked about the internal logical correctness, but asked about
> *actual harm* revealed to users. I don't see any actual harm in the
> "wrong" transition because:

Actually, the incorrect state transition is not so harmful except that
users can observe unnecessary state changes. However, I don't think any
actual harm in prohibit the incorrect state transition. So, I think we
can do it.

> If we are going to introduce that complexity, I'd like to re-propose
> to introduce interlocking between the recovery side and the
> pause-requestor side instead of introducing the intermediate state,
> which is the cause of the complexity.
> 
> The attached PoC patch adds:
> 
> - A solid checkpoint just before calling rm_redo. It doesn't add a
>   info_lck since the check is done in the existing lock section.
> 
> - Interlocking between the above and SetRecoveryPause without adding a
>   shared variable.
>   (This is what I called "synchronous" before.)

I think waiting in pg_wal_replay_pasue is a possible option, but this will
also introduce other complexity to codes such as possibility of waiting for
long or for ever.  For example, waiting in SetRecoveryPause as in your POC
patch appears to make recovery stuck in  RecoveryRequiresIntParameter.

By the way, attaching other patch to a thread without the original patch
will make commitfest and cfbot APP confused...


Regards,
Yugo Nagata


-- 
Yugo NAGATA 




Re: Is Recovery actually paused?

2021-02-08 Thread Kyotaro Horiguchi
At Mon, 8 Feb 2021 17:05:52 +0530, Dilip Kumar  wrote in 
> On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA  wrote:
> >
> > On Mon, 08 Feb 2021 17:32:46 +0900 (JST)
> > Kyotaro Horiguchi  wrote:
> >
> > > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA  
> > > wrote in
> > > > > > > I think the right fix should be that the state should never go 
> > > > > > > from
> > > > > > > ‘paused’ to ‘pause requested’  so I think pg_wal_replay_pause 
> > > > > > > should take
> > > > > > > care of that.
> > > > > >
> > > > > > It makes sense to take care of this in pg_wal_replay_pause, but I 
> > > > > > wonder
> > > > > > it can not handle the case that a user resume and pause again while 
> > > > > > a sleep.
> > > > >
> > > > > Right, we will have to check and set in the loop.  But we should not
> > > > > allow the state to go from paused to pause requested irrespective of
> > > > > this.
> > > >
> > > > I agree with you.
> > >
> > > Is there any actual harm if PAUSED returns to REQUESETED, assuming we
> > > immediately change the state to PAUSE always we see REQUESTED in the
> > > waiting loop, despite that we allow change the state from PAUSE to
> > > REQUESTED via NOT_PAUSED between two successive loop condition checks?
> >
> > If a user call pg_wal_replay_pause while recovery is paused, users can
> > observe 'pause requested' during a sleep alghough the time window is short.
> > It seems a bit odd that pg_wal_replay_pause changes the state like this
> > because This state meeans that recovery may not be 'paused'.
> 
> Yeah, this appears wrong that after 'paused' we go back to 'pause
> requested'.  the logical state transition should always be as below
> 
> NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to
> request and then paused but there is nothing wrong with going to
> paused)
> PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or get 
> paused)
> PAUSED -> NOT PAUSED (from PAUSED we should not go to the
> PAUSE_REQUESTED without going to NOT PAUSED)

I didn't asked about the internal logical correctness, but asked about
*actual harm* revealed to users. I don't see any actual harm in the
"wrong" transition because:

1. It is not wrong nor strange that the invoker of pg_wal_replay_pause
  sees the state PAUSE_REQUESTED before it changes to PAUSED.  Even if
  the previous state was PAUSED, it is no business of the requestors.

2. It is no harm in the recovery side since PAUSE_REQUESTED and PAUSED
  are effectively the same state.

3. After we inhibited the direct transition from
  PAUSED->PAUSE_REQUESTED, the effectively the same transition
  PAUSED->NOT_PAUSED->PAUSE_REQUESTED is still allowed. The inhibition
  of the former transition doesn't protect anything other than seeming
  correctness of the transition.

If we are going to introduce that complexity, I'd like to re-propose
to introduce interlocking between the recovery side and the
pause-requestor side instead of introducing the intermediate state,
which is the cause of the complexity.

The problem is due to the looseness of checking for pause requests in
the existing checkponts, and the window after the last checkpoint
until calling rm_redo().

The attached PoC patch adds:

- A solid checkpoint just before calling rm_redo. It doesn't add a
  info_lck since the check is done in the existing lock section.

- Interlocking between the above and SetRecoveryPause without adding a
  shared variable.
  (This is what I called "synchronous" before.)

There's a concern about pausing after updating
XlogCtl->replayEndRecPtr but I don't see an issue yet..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8e3b5df7dc..26aadb4178 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6076,6 +6076,20 @@ void
 SetRecoveryPause(bool recoveryPause)
 {
 	SpinLockAcquire(>info_lck);
+
+	/*
+	 * Wait for the application of the record being applied to finish, so that
+	 * no records will be applied after this function returns. We don't need to
+	 * wait when ending a pause. Anyway we are requesting a recovery pause, we
+	 * don't mind a possible slow down of recovery by the info_lck here.
+	 */
+	while(recoveryPause && !XLogCtl->recoveryPause &&
+		  XLogCtl->replayEndRecPtr != XLogCtl->lastReplayedEndRecPtr)
+	{
+		SpinLockRelease(>info_lck);
+		pg_usleep(1L);		/* 10 ms */
+		SpinLockAcquire(>info_lck);
+	}
 	XLogCtl->recoveryPause = recoveryPause;
 	SpinLockRelease(>info_lck);
 }
@@ -7262,6 +7276,7 @@ StartupXLOG(void)
 			do
 			{
 bool		switchedTLI = false;
+bool		pause_requested = false;
 
 #ifdef WAL_DEBUG
 if (XLOG_DEBUG ||
@@ -7292,11 +7307,9 @@ StartupXLOG(void)
  * Note that we intentionally don't take the info_lck spinlock
  * here.  We might therefore read a slightly stale value of
  * the recoveryPause flag, but it can't be very stale (no
- * worse than the 

Re: Is Recovery actually paused?

2021-02-08 Thread Dilip Kumar
On Mon, Feb 8, 2021 at 2:19 PM Yugo NAGATA  wrote:
>
> On Mon, 08 Feb 2021 17:32:46 +0900 (JST)
> Kyotaro Horiguchi  wrote:
>
> > At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA  wrote 
> > in
> > > > > > I think the right fix should be that the state should never go from
> > > > > > ‘paused’ to ‘pause requested’  so I think pg_wal_replay_pause 
> > > > > > should take
> > > > > > care of that.
> > > > >
> > > > > It makes sense to take care of this in pg_wal_replay_pause, but I 
> > > > > wonder
> > > > > it can not handle the case that a user resume and pause again while a 
> > > > > sleep.
> > > >
> > > > Right, we will have to check and set in the loop.  But we should not
> > > > allow the state to go from paused to pause requested irrespective of
> > > > this.
> > >
> > > I agree with you.
> >
> > Is there any actual harm if PAUSED returns to REQUESETED, assuming we
> > immediately change the state to PAUSE always we see REQUESTED in the
> > waiting loop, despite that we allow change the state from PAUSE to
> > REQUESTED via NOT_PAUSED between two successive loop condition checks?
>
> If a user call pg_wal_replay_pause while recovery is paused, users can
> observe 'pause requested' during a sleep alghough the time window is short.
> It seems a bit odd that pg_wal_replay_pause changes the state like this
> because This state meeans that recovery may not be 'paused'.

Yeah, this appears wrong that after 'paused' we go back to 'pause
requested'.  the logical state transition should always be as below

NOT PAUSED -> PAUSE REQUESTED or PAUSED (maybe we should always go to
request and then paused but there is nothing wrong with going to
paused)
PAUSE REQUESTED -> NOT PAUSE or PAUSED (either cancel the request or get paused)
PAUSED -> NOT PAUSED (from PAUSED we should not go to the
PAUSE_REQUESTED without going to NOT PAUSED)

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-08 Thread Yugo NAGATA
On Mon, 08 Feb 2021 17:32:46 +0900 (JST)
Kyotaro Horiguchi  wrote:

> At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA  wrote in 
> > > > > I think the right fix should be that the state should never go from
> > > > > ‘paused’ to ‘pause requested’  so I think pg_wal_replay_pause should 
> > > > > take
> > > > > care of that.
> > > >
> > > > It makes sense to take care of this in pg_wal_replay_pause, but I wonder
> > > > it can not handle the case that a user resume and pause again while a 
> > > > sleep.
> > > 
> > > Right, we will have to check and set in the loop.  But we should not
> > > allow the state to go from paused to pause requested irrespective of
> > > this.
> > 
> > I agree with you.
> 
> Is there any actual harm if PAUSED returns to REQUESETED, assuming we
> immediately change the state to PAUSE always we see REQUESTED in the
> waiting loop, despite that we allow change the state from PAUSE to
> REQUESTED via NOT_PAUSED between two successive loop condition checks?

If a user call pg_wal_replay_pause while recovery is paused, users can
observe 'pause requested' during a sleep alghough the time window is short.  
It seems a bit odd that pg_wal_replay_pause changes the state like this
because This state meeans that recovery may not be 'paused'. 

-- 
Yugo NAGATA 




Re: Is Recovery actually paused?

2021-02-08 Thread Kyotaro Horiguchi
At Mon, 8 Feb 2021 14:12:35 +0900, Yugo NAGATA  wrote in 
> > > > I think the right fix should be that the state should never go from
> > > > ‘paused’ to ‘pause requested’  so I think pg_wal_replay_pause should 
> > > > take
> > > > care of that.
> > >
> > > It makes sense to take care of this in pg_wal_replay_pause, but I wonder
> > > it can not handle the case that a user resume and pause again while a 
> > > sleep.
> > 
> > Right, we will have to check and set in the loop.  But we should not
> > allow the state to go from paused to pause requested irrespective of
> > this.
> 
> I agree with you.

Is there any actual harm if PAUSED returns to REQUESETED, assuming we
immediately change the state to PAUSE always we see REQUESTED in the
waiting loop, despite that we allow change the state from PAUSE to
REQUESTED via NOT_PAUSED between two successive loop condition checks?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Re: Is Recovery actually paused?

2021-02-07 Thread Yugo NAGATA
On Mon, 8 Feb 2021 09:35:00 +0530
Dilip Kumar  wrote:

> On Mon, Feb 8, 2021 at 8:18 AM Yugo NAGATA  wrote:
> >
> > On Mon, 8 Feb 2021 07:51:22 +0530
> > Dilip Kumar  wrote:
> >
> > > On Mon, 8 Feb 2021 at 6:38 AM, Yugo NAGATA  wrote:
> > >
> > > > Hi,
> > > >
> > > > On Sun, 7 Feb 2021 19:27:02 +0530
> > > > Dilip Kumar  wrote:
> > > >
> > > > > On Sun, Feb 7, 2021 at 6:44 PM Bharath Rupireddy
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Feb 5, 2021 at 10:14 AM Bharath Rupireddy
> > > > > >  wrote:
> > > > > > > > We can not do that, basically, under one lock we need to check 
> > > > > > > > the
> > > > > > > > state and set it to pause.  Because by the time you release the
> > > > lock
> > > > > > > > someone might set it to RECOVERY_NOT_PAUSED then you don't want 
> > > > > > > > to
> > > > set
> > > > > > > > it to RECOVERY_PAUSED.
> > > > > > >
> > > > > > > Got it. Thanks.
> > > > > >
> > > > > > Hi Dilip, I have one more question:
> > > > > >
> > > > > > +/* test for recovery pause, if user has requested the 
> > > > > > pause */
> > > > > > +if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState 
> > > > > > ==
> > > > > > +RECOVERY_PAUSE_REQUESTED)
> > > > > > +recoveryPausesHere(false);
> > > > > > +
> > > > > > +now = GetCurrentTimestamp();
> > > > > > +
> > > > > >
> > > > > > Do we need  now = GetCurrentTimestamp(); here? Because, I see that
> > > > > > whenever the variable now is used within the for loop in
> > > > > > WaitForWALToBecomeAvailable, it's re-calculated anyways. It's being
> > > > > > used within case XLOG_FROM_STREAM:
> > > > > >
> > > > > > Am I missing something?
> > > > >
> > > > > Yeah, I don't see any reason for doing this, maybe it got copy pasted
> > > > > by mistake.  Thanks for observing this.
> > > >
> > > > I also have a question:
> > > >
> > > > @@ -6270,14 +6291,14 @@ RecoveryRequiresIntParameter(const char
> > > > *param_name, int currValue, int minValue
> > > >currValue,
> > > >minValue)));
> > > >
> > > > -   SetRecoveryPause(true);
> > > > +   SetRecoveryPause(RECOVERY_PAUSED);
> > > >
> > > > ereport(LOG,
> > > > (errmsg("recovery has paused"),
> > > >  errdetail("If recovery is
> > > > unpaused, the server will shut down."),
> > > >  errhint("You can then restart 
> > > > the
> > > > server after making the necessary configuration changes.")));
> > > >
> > > > -   while (RecoveryIsPaused())
> > > > +   while (GetRecoveryPauseState() !=
> > > > RECOVERY_NOT_PAUSED)
> > > > {
> > > > HandleStartupProcInterrupts();
> > > >
> > > >
> > > >
> > > > If a user call pg_wal_replay_pause while waiting in
> > > > RecoveryRequiresIntParameter,
> > > > the state become 'pause requested' and this never returns to 'paused'.
> > > > Should we check recoveryPauseState in this loop as in
> > >
> > >
> > > I think the right fix should be that the state should never go from
> > > ‘paused’ to ‘pause requested’  so I think pg_wal_replay_pause should take
> > > care of that.
> >
> > It makes sense to take care of this in pg_wal_replay_pause, but I wonder
> > it can not handle the case that a user resume and pause again while a sleep.
> 
> Right, we will have to check and set in the loop.  But we should not
> allow the state to go from paused to pause requested irrespective of
> this.

I agree with you.

-- 
Yugo NAGATA 




Re: Is Recovery actually paused?

2021-02-07 Thread Dilip Kumar
On Mon, Feb 8, 2021 at 9:49 AM Bharath Rupireddy
 wrote:
>
> On Mon, Feb 8, 2021 at 9:35 AM Dilip Kumar  wrote:
> > > > > If a user call pg_wal_replay_pause while waiting in
> > > > > RecoveryRequiresIntParameter,
> > > > > the state become 'pause requested' and this never returns to 'paused'.
> > > > > Should we check recoveryPauseState in this loop as in
> > > >
> > > >
> > > > I think the right fix should be that the state should never go from
> > > > ‘paused’ to ‘pause requested’  so I think pg_wal_replay_pause should 
> > > > take
> > > > care of that.
> > >
> > > It makes sense to take care of this in pg_wal_replay_pause, but I wonder
> > > it can not handle the case that a user resume and pause again while a 
> > > sleep.
> >
> > Right, we will have to check and set in the loop.  But we should not
> > allow the state to go from paused to pause requested irrespective of
> > this.
>
> We can think of a state machine with the states "not paused", "pause
> requested", "paused". While we can go to "not paused" from any state,
> but cannot go to "pause requested" from "paused".
>
> So, will pg_wal_replay_pause throw an error or warning or silently
 > return when it's  called and the state is "paused" already?

It should just silently return because pg_wal_replay_pause just claim
it request to pause, but it not mean that it can not pause
immediately.

 Maybe we
> should add better commenting in pg_wal_replay_pause why we don't set
> "pause requested" when the state is already "paused".



> And also, if we are adding below code in the
> RecoveryRequiresIntParameter loop, it's better to make it a function,
> like your earlier patch.
>
> /*
>  * If recovery pause is requested then set it paused.  While we are in
>  * the loop, user might resume and pause again so set this every time.
>  */
> SpinLockAcquire(>info_lck);
> if (XLogCtl->recoveryPauseState == RECOVERY_PAUSE_REQUESTED)
> XLogCtl->recoveryPauseState = RECOVERY_PAUSED;
> SpinLockRelease(>info_lck);

Yes, it should go back to function now as in the older versions.



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-07 Thread Bharath Rupireddy
On Mon, Feb 8, 2021 at 9:35 AM Dilip Kumar  wrote:
> > > > If a user call pg_wal_replay_pause while waiting in
> > > > RecoveryRequiresIntParameter,
> > > > the state become 'pause requested' and this never returns to 'paused'.
> > > > Should we check recoveryPauseState in this loop as in
> > >
> > >
> > > I think the right fix should be that the state should never go from
> > > ‘paused’ to ‘pause requested’  so I think pg_wal_replay_pause should take
> > > care of that.
> >
> > It makes sense to take care of this in pg_wal_replay_pause, but I wonder
> > it can not handle the case that a user resume and pause again while a sleep.
>
> Right, we will have to check and set in the loop.  But we should not
> allow the state to go from paused to pause requested irrespective of
> this.

We can think of a state machine with the states "not paused", "pause
requested", "paused". While we can go to "not paused" from any state,
but cannot go to "pause requested" from "paused".

So, will pg_wal_replay_pause throw an error or warning or silently
return when it's  called and the state is "paused" already? Maybe we
should add better commenting in pg_wal_replay_pause why we don't set
"pause requested" when the state is already "paused".

And also, if we are adding below code in the
RecoveryRequiresIntParameter loop, it's better to make it a function,
like your earlier patch.

/*
 * If recovery pause is requested then set it paused.  While we are in
 * the loop, user might resume and pause again so set this every time.
 */
SpinLockAcquire(>info_lck);
if (XLogCtl->recoveryPauseState == RECOVERY_PAUSE_REQUESTED)
XLogCtl->recoveryPauseState = RECOVERY_PAUSED;
SpinLockRelease(>info_lck);

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-07 Thread Dilip Kumar
On Mon, Feb 8, 2021 at 8:18 AM Yugo NAGATA  wrote:
>
> On Mon, 8 Feb 2021 07:51:22 +0530
> Dilip Kumar  wrote:
>
> > On Mon, 8 Feb 2021 at 6:38 AM, Yugo NAGATA  wrote:
> >
> > > Hi,
> > >
> > > On Sun, 7 Feb 2021 19:27:02 +0530
> > > Dilip Kumar  wrote:
> > >
> > > > On Sun, Feb 7, 2021 at 6:44 PM Bharath Rupireddy
> > > >  wrote:
> > > > >
> > > > > On Fri, Feb 5, 2021 at 10:14 AM Bharath Rupireddy
> > > > >  wrote:
> > > > > > > We can not do that, basically, under one lock we need to check the
> > > > > > > state and set it to pause.  Because by the time you release the
> > > lock
> > > > > > > someone might set it to RECOVERY_NOT_PAUSED then you don't want to
> > > set
> > > > > > > it to RECOVERY_PAUSED.
> > > > > >
> > > > > > Got it. Thanks.
> > > > >
> > > > > Hi Dilip, I have one more question:
> > > > >
> > > > > +/* test for recovery pause, if user has requested the pause 
> > > > > */
> > > > > +if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState ==
> > > > > +RECOVERY_PAUSE_REQUESTED)
> > > > > +recoveryPausesHere(false);
> > > > > +
> > > > > +now = GetCurrentTimestamp();
> > > > > +
> > > > >
> > > > > Do we need  now = GetCurrentTimestamp(); here? Because, I see that
> > > > > whenever the variable now is used within the for loop in
> > > > > WaitForWALToBecomeAvailable, it's re-calculated anyways. It's being
> > > > > used within case XLOG_FROM_STREAM:
> > > > >
> > > > > Am I missing something?
> > > >
> > > > Yeah, I don't see any reason for doing this, maybe it got copy pasted
> > > > by mistake.  Thanks for observing this.
> > >
> > > I also have a question:
> > >
> > > @@ -6270,14 +6291,14 @@ RecoveryRequiresIntParameter(const char
> > > *param_name, int currValue, int minValue
> > >currValue,
> > >minValue)));
> > >
> > > -   SetRecoveryPause(true);
> > > +   SetRecoveryPause(RECOVERY_PAUSED);
> > >
> > > ereport(LOG,
> > > (errmsg("recovery has paused"),
> > >  errdetail("If recovery is
> > > unpaused, the server will shut down."),
> > >  errhint("You can then restart the
> > > server after making the necessary configuration changes.")));
> > >
> > > -   while (RecoveryIsPaused())
> > > +   while (GetRecoveryPauseState() !=
> > > RECOVERY_NOT_PAUSED)
> > > {
> > > HandleStartupProcInterrupts();
> > >
> > >
> > >
> > > If a user call pg_wal_replay_pause while waiting in
> > > RecoveryRequiresIntParameter,
> > > the state become 'pause requested' and this never returns to 'paused'.
> > > Should we check recoveryPauseState in this loop as in
> >
> >
> > I think the right fix should be that the state should never go from
> > ‘paused’ to ‘pause requested’  so I think pg_wal_replay_pause should take
> > care of that.
>
> It makes sense to take care of this in pg_wal_replay_pause, but I wonder
> it can not handle the case that a user resume and pause again while a sleep.

Right, we will have to check and set in the loop.  But we should not
allow the state to go from paused to pause requested irrespective of
this.

I will make these changes and send the updated patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-07 Thread Yugo NAGATA
On Mon, 8 Feb 2021 07:51:22 +0530
Dilip Kumar  wrote:

> On Mon, 8 Feb 2021 at 6:38 AM, Yugo NAGATA  wrote:
> 
> > Hi,
> >
> > On Sun, 7 Feb 2021 19:27:02 +0530
> > Dilip Kumar  wrote:
> >
> > > On Sun, Feb 7, 2021 at 6:44 PM Bharath Rupireddy
> > >  wrote:
> > > >
> > > > On Fri, Feb 5, 2021 at 10:14 AM Bharath Rupireddy
> > > >  wrote:
> > > > > > We can not do that, basically, under one lock we need to check the
> > > > > > state and set it to pause.  Because by the time you release the
> > lock
> > > > > > someone might set it to RECOVERY_NOT_PAUSED then you don't want to
> > set
> > > > > > it to RECOVERY_PAUSED.
> > > > >
> > > > > Got it. Thanks.
> > > >
> > > > Hi Dilip, I have one more question:
> > > >
> > > > +/* test for recovery pause, if user has requested the pause */
> > > > +if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState ==
> > > > +RECOVERY_PAUSE_REQUESTED)
> > > > +recoveryPausesHere(false);
> > > > +
> > > > +now = GetCurrentTimestamp();
> > > > +
> > > >
> > > > Do we need  now = GetCurrentTimestamp(); here? Because, I see that
> > > > whenever the variable now is used within the for loop in
> > > > WaitForWALToBecomeAvailable, it's re-calculated anyways. It's being
> > > > used within case XLOG_FROM_STREAM:
> > > >
> > > > Am I missing something?
> > >
> > > Yeah, I don't see any reason for doing this, maybe it got copy pasted
> > > by mistake.  Thanks for observing this.
> >
> > I also have a question:
> >
> > @@ -6270,14 +6291,14 @@ RecoveryRequiresIntParameter(const char
> > *param_name, int currValue, int minValue
> >currValue,
> >minValue)));
> >
> > -   SetRecoveryPause(true);
> > +   SetRecoveryPause(RECOVERY_PAUSED);
> >
> > ereport(LOG,
> > (errmsg("recovery has paused"),
> >  errdetail("If recovery is
> > unpaused, the server will shut down."),
> >  errhint("You can then restart the
> > server after making the necessary configuration changes.")));
> >
> > -   while (RecoveryIsPaused())
> > +   while (GetRecoveryPauseState() !=
> > RECOVERY_NOT_PAUSED)
> > {
> > HandleStartupProcInterrupts();
> >
> >
> >
> > If a user call pg_wal_replay_pause while waiting in
> > RecoveryRequiresIntParameter,
> > the state become 'pause requested' and this never returns to 'paused'.
> > Should we check recoveryPauseState in this loop as in
> 
> 
> I think the right fix should be that the state should never go from
> ‘paused’ to ‘pause requested’  so I think pg_wal_replay_pause should take
> care of that.

It makes sense to take care of this in pg_wal_replay_pause, but I wonder
it can not handle the case that a user resume and pause again while a sleep.


-- 
Yugo NAGATA 




Re: Is Recovery actually paused?

2021-02-07 Thread Dilip Kumar
On Mon, 8 Feb 2021 at 6:38 AM, Yugo NAGATA  wrote:

> Hi,
>
> On Sun, 7 Feb 2021 19:27:02 +0530
> Dilip Kumar  wrote:
>
> > On Sun, Feb 7, 2021 at 6:44 PM Bharath Rupireddy
> >  wrote:
> > >
> > > On Fri, Feb 5, 2021 at 10:14 AM Bharath Rupireddy
> > >  wrote:
> > > > > We can not do that, basically, under one lock we need to check the
> > > > > state and set it to pause.  Because by the time you release the
> lock
> > > > > someone might set it to RECOVERY_NOT_PAUSED then you don't want to
> set
> > > > > it to RECOVERY_PAUSED.
> > > >
> > > > Got it. Thanks.
> > >
> > > Hi Dilip, I have one more question:
> > >
> > > +/* test for recovery pause, if user has requested the pause */
> > > +if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState ==
> > > +RECOVERY_PAUSE_REQUESTED)
> > > +recoveryPausesHere(false);
> > > +
> > > +now = GetCurrentTimestamp();
> > > +
> > >
> > > Do we need  now = GetCurrentTimestamp(); here? Because, I see that
> > > whenever the variable now is used within the for loop in
> > > WaitForWALToBecomeAvailable, it's re-calculated anyways. It's being
> > > used within case XLOG_FROM_STREAM:
> > >
> > > Am I missing something?
> >
> > Yeah, I don't see any reason for doing this, maybe it got copy pasted
> > by mistake.  Thanks for observing this.
>
> I also have a question:
>
> @@ -6270,14 +6291,14 @@ RecoveryRequiresIntParameter(const char
> *param_name, int currValue, int minValue
>currValue,
>minValue)));
>
> -   SetRecoveryPause(true);
> +   SetRecoveryPause(RECOVERY_PAUSED);
>
> ereport(LOG,
> (errmsg("recovery has paused"),
>  errdetail("If recovery is
> unpaused, the server will shut down."),
>  errhint("You can then restart the
> server after making the necessary configuration changes.")));
>
> -   while (RecoveryIsPaused())
> +   while (GetRecoveryPauseState() !=
> RECOVERY_NOT_PAUSED)
> {
> HandleStartupProcInterrupts();
>
>
>
> If a user call pg_wal_replay_pause while waiting in
> RecoveryRequiresIntParameter,
> the state become 'pause requested' and this never returns to 'paused'.
> Should we check recoveryPauseState in this loop as in


I think the right fix should be that the state should never go from
‘paused’ to ‘pause requested’  so I think pg_wal_replay_pause should take
care of that.

> --
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: Is Recovery actually paused?

2021-02-07 Thread Yugo NAGATA
Hi,

On Sun, 7 Feb 2021 19:27:02 +0530
Dilip Kumar  wrote:

> On Sun, Feb 7, 2021 at 6:44 PM Bharath Rupireddy
>  wrote:
> >
> > On Fri, Feb 5, 2021 at 10:14 AM Bharath Rupireddy
> >  wrote:
> > > > We can not do that, basically, under one lock we need to check the
> > > > state and set it to pause.  Because by the time you release the lock
> > > > someone might set it to RECOVERY_NOT_PAUSED then you don't want to set
> > > > it to RECOVERY_PAUSED.
> > >
> > > Got it. Thanks.
> >
> > Hi Dilip, I have one more question:
> >
> > +/* test for recovery pause, if user has requested the pause */
> > +if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState ==
> > +RECOVERY_PAUSE_REQUESTED)
> > +recoveryPausesHere(false);
> > +
> > +now = GetCurrentTimestamp();
> > +
> >
> > Do we need  now = GetCurrentTimestamp(); here? Because, I see that
> > whenever the variable now is used within the for loop in
> > WaitForWALToBecomeAvailable, it's re-calculated anyways. It's being
> > used within case XLOG_FROM_STREAM:
> >
> > Am I missing something?
> 
> Yeah, I don't see any reason for doing this, maybe it got copy pasted
> by mistake.  Thanks for observing this.

I also have a question:
 
@@ -6270,14 +6291,14 @@ RecoveryRequiresIntParameter(const char *param_name, 
int currValue, int minValue
   currValue,
   minValue)));
 
-   SetRecoveryPause(true);
+   SetRecoveryPause(RECOVERY_PAUSED);
 
ereport(LOG,
(errmsg("recovery has paused"),
 errdetail("If recovery is unpaused, 
the server will shut down."),
 errhint("You can then restart the 
server after making the necessary configuration changes.")));
 
-   while (RecoveryIsPaused())
+   while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)
{
HandleStartupProcInterrupts();



If a user call pg_wal_replay_pause while waiting in 
RecoveryRequiresIntParameter,
the state become 'pause requested' and this never returns to 'paused'.
Should we check recoveryPauseState in this loop as in recoveryPausesHere?


Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: Is Recovery actually paused?

2021-02-07 Thread Dilip Kumar
On Sun, Feb 7, 2021 at 6:44 PM Bharath Rupireddy
 wrote:
>
> On Fri, Feb 5, 2021 at 10:14 AM Bharath Rupireddy
>  wrote:
> > > We can not do that, basically, under one lock we need to check the
> > > state and set it to pause.  Because by the time you release the lock
> > > someone might set it to RECOVERY_NOT_PAUSED then you don't want to set
> > > it to RECOVERY_PAUSED.
> >
> > Got it. Thanks.
>
> Hi Dilip, I have one more question:
>
> +/* test for recovery pause, if user has requested the pause */
> +if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState ==
> +RECOVERY_PAUSE_REQUESTED)
> +recoveryPausesHere(false);
> +
> +now = GetCurrentTimestamp();
> +
>
> Do we need  now = GetCurrentTimestamp(); here? Because, I see that
> whenever the variable now is used within the for loop in
> WaitForWALToBecomeAvailable, it's re-calculated anyways. It's being
> used within case XLOG_FROM_STREAM:
>
> Am I missing something?

Yeah, I don't see any reason for doing this, maybe it got copy pasted
by mistake.  Thanks for observing this.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From b5655514681317d997d8245068237acf5c6ada9d Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Wed, 27 Jan 2021 16:46:04 +0530
Subject: [PATCH v12] Provide a new interface to get the recovery pause status

Currently, pg_is_wal_replay_paused, just checks whether the recovery
pause is requested or not but it doesn't actually tell whether the
recovery is actually paused or not.  So basically there is no way for
the user to know the actual status of the pause request.  This patch
provides a new interface pg_get_wal_replay_pause_state that will
return the actual status of the recovery pause i.e.'not paused' if
pause is not requested, 'pause requested' if pause is requested but
recovery is not yet paused and 'paused' if recovery is actually paused.
---
 doc/src/sgml/func.sgml | 32 +++---
 src/backend/access/transam/xlog.c  | 62 --
 src/backend/access/transam/xlogfuncs.c | 50 ---
 src/include/access/xlog.h  | 12 +--
 src/include/catalog/pg_proc.dat|  4 +++
 5 files changed, 132 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1ab31a9..9b2c429 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -25285,7 +25285,24 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 boolean


-Returns true if recovery is paused.
+Returns true if recovery pause is requested.
+   
+  
+
+  
+   
+
+ pg_get_wal_replay_pause_state
+
+pg_get_wal_replay_pause_state ()
+text
+   
+   
+Returns recovery pause state.  The return values are 
+not paused if pause is not requested, 
+pause requested if pause is requested but recovery is
+not yet paused and, paused if the recovery is
+actually paused.

   
 
@@ -25324,10 +25341,15 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 void


-Pauses recovery.  While recovery is paused, no further database
-changes are applied.  If hot standby is active, all new queries will
-see the same consistent snapshot of the database, and no further query
-conflicts will be generated until recovery is resumed.
+Request to pause recovery.  A request doesn't mean that recovery stops
+right away.  If you want a guarantee that recovery is actually paused,
+you need to check for the recovery pause state returned by
+pg_get_wal_replay_pause_state().  Note that
+pg_is_wal_replay_paused() returns whether a request
+is made.  While recovery is paused, no further database changes are applied.
+If hot standby is active, all new queries will see the same consistent
+snapshot of the database, and no further query conflicts will be generated
+until recovery is resumed.


 This function is restricted to superusers by default, but other users
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8e3b5df..aa89e52 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -721,8 +721,8 @@ typedef struct XLogCtlData
 	 * only relevant for replication or archive recovery
 	 */
 	TimestampTz currentChunkStartTime;
-	/* Are we requested to pause recovery? */
-	bool		recoveryPause;
+	/* Recovery pause state */
+	RecoveryPauseState	recoveryPauseState;
 
 	/*
 	 * lastFpwDisableRecPtr points to the start of the last replayed
@@ -6019,7 +6019,7 @@ recoveryStopsAfter(XLogReaderState *record)
 }
 
 /*
- * Wait until shared recoveryPause flag is cleared.
+ * Wait until shared recoveryPauseState is set to 

Re: Is Recovery actually paused?

2021-02-07 Thread Bharath Rupireddy
On Fri, Feb 5, 2021 at 10:14 AM Bharath Rupireddy
 wrote:
> > We can not do that, basically, under one lock we need to check the
> > state and set it to pause.  Because by the time you release the lock
> > someone might set it to RECOVERY_NOT_PAUSED then you don't want to set
> > it to RECOVERY_PAUSED.
>
> Got it. Thanks.

Hi Dilip, I have one more question:

+/* test for recovery pause, if user has requested the pause */
+if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState ==
+RECOVERY_PAUSE_REQUESTED)
+recoveryPausesHere(false);
+
+now = GetCurrentTimestamp();
+

Do we need  now = GetCurrentTimestamp(); here? Because, I see that
whenever the variable now is used within the for loop in
WaitForWALToBecomeAvailable, it's re-calculated anyways. It's being
used within case XLOG_FROM_STREAM:

Am I missing something?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-04 Thread Bharath Rupireddy
On Fri, Feb 5, 2021 at 10:06 AM Dilip Kumar  wrote:
>
> On Fri, Feb 5, 2021 at 6:22 AM Bharath Rupireddy
>  wrote:
> >
> > On Thu, Feb 4, 2021 at 7:20 PM Dilip Kumar  wrote:
> > > > How can we do that this is not a 1 byte flag this is enum so I don't
> > > > think we can read any atomic state without a spin lock here.
> > >
> > > I have fixed the other comments and the updated patch is attached.
> >
> > Can we just do like below so that we could use the existing
> > SetRecoveryPause instead of setting the state outside?
> >
> > /* loop until recoveryPauseState is set to RECOVERY_NOT_PAUSED */
> > while (1)
> > {
> > RecoveryPauseState state;
> >
> > state = GetRecoveryPauseState();
> >
> > if (state == RECOVERY_NOT_PAUSED)
> > break;
> >
> > HandleStartupProcInterrupts();
> >
> > if (CheckForStandbyTrigger())
> > return;
> > pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);
> >
> > /*
> >  * If recovery pause is requested then set it paused.  While we are 
> > in
> >  * the loop, user might resume and pause again so set this every 
> > time.
> >  */
> > if (state == RECOVERY_PAUSE_REQUESTED)
> > SetRecoveryPause(RECOVERY_PAUSED)
>
> We can not do that, basically, under one lock we need to check the
> state and set it to pause.  Because by the time you release the lock
> someone might set it to RECOVERY_NOT_PAUSED then you don't want to set
> it to RECOVERY_PAUSED.

Got it. Thanks.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-04 Thread Dilip Kumar
On Fri, Feb 5, 2021 at 6:22 AM Bharath Rupireddy
 wrote:
>
> On Thu, Feb 4, 2021 at 7:20 PM Dilip Kumar  wrote:
> > > How can we do that this is not a 1 byte flag this is enum so I don't
> > > think we can read any atomic state without a spin lock here.
> >
> > I have fixed the other comments and the updated patch is attached.
>
> Can we just do like below so that we could use the existing
> SetRecoveryPause instead of setting the state outside?
>
> /* loop until recoveryPauseState is set to RECOVERY_NOT_PAUSED */
> while (1)
> {
> RecoveryPauseState state;
>
> state = GetRecoveryPauseState();
>
> if (state == RECOVERY_NOT_PAUSED)
> break;
>
> HandleStartupProcInterrupts();
>
> if (CheckForStandbyTrigger())
> return;
> pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);
>
> /*
>  * If recovery pause is requested then set it paused.  While we are in
>  * the loop, user might resume and pause again so set this every time.
>  */
> if (state == RECOVERY_PAUSE_REQUESTED)
> SetRecoveryPause(RECOVERY_PAUSED)

We can not do that, basically, under one lock we need to check the
state and set it to pause.  Because by the time you release the lock
someone might set it to RECOVERY_NOT_PAUSED then you don't want to set
it to RECOVERY_PAUSED.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-04 Thread Dilip Kumar
On Thu, Feb 4, 2021 at 10:19 PM Robert Haas  wrote:
>
> On Thu, Feb 4, 2021 at 7:46 AM Dilip Kumar  wrote:
> > How can we do that this is not a 1 byte flag this is enum so I don't
> > think we can read any atomic state without a spin lock here.
>
> I think this discussion of atomics is confused. Let's talk about what
> atomic reads and writes mean. Imagine that you have a 64-bit value
> 0x0101010101010101. Somebody sets it to 0x0202020202020202. Imagine
> that just as they are doing that, someone else reads the value and
> gets 0x0202020201010101, because half of the value has been updated
> and the other half has not yet been updated yet. This kind of thing
> can actually happen on some platforms and what it means is that on
> those platforms 8-byte reads and writes are not atomic. The idea of an
> "atom" is that it can't be split into pieces but these reads and
> writes on some platforms are actually not "atomic" because they are
> split into two 4-byte pieces. But there's no such thing as a 1-byte
> read or write not being atomic. In theory you could imagine a computer
> where when you change 0x01 to 0x23 and read in the middle and see 0x21
> or 0x13 or something, but no actual computers behave that way, or at
> least no mainstream ones that anybody cares about. So the idea that
> you somehow need a lock to prevent this is just wrong.
>
> Concurrent programs also suffer from another problem which is
> reordering of operations, which can happen either as the program is
> compiled or as the program is executed by the CPU. The CPU can see you
> set a->x = 1 and a->y = 2 and decide to update y first and then x even
> though you wrote it the other way around in the program text. To
> prevent this, we have barrier operations; see README.barrier in the
> source tree for a longer explanation. Atomic operations like
> compare-and-exchange are also full barriers, so that they not only
> prevent the torn read/write problem described above, but also enforce
> order of operations more strictly.
>
> Now I don't know whether a lock is needed here or not. Maybe it is;
> perhaps for consistency with other code, perhaps because the lock
> acquire and release is serving the function of a barrier; or perhaps
> to guard against some other hazard. But saying that it's because
> reading or writing a 1-byte value might not be atomic does not sound
> correct.

I never told that reading /writing 1 byte is not atomic, of course,
they are.  I told that we can only guarantee that 1-byte read/write is
atomic but this variable is not a bool or 1-byte value and the enum
can take 32 bits on a 32-bit platform so we can not guarantee the
atomic read/write on some processor so we need a lock.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-04 Thread Bharath Rupireddy
On Thu, Feb 4, 2021 at 7:20 PM Dilip Kumar  wrote:
> > How can we do that this is not a 1 byte flag this is enum so I don't
> > think we can read any atomic state without a spin lock here.
>
> I have fixed the other comments and the updated patch is attached.

Can we just do like below so that we could use the existing
SetRecoveryPause instead of setting the state outside?

/* loop until recoveryPauseState is set to RECOVERY_NOT_PAUSED */
while (1)
{
RecoveryPauseState state;

state = GetRecoveryPauseState();

if (state == RECOVERY_NOT_PAUSED)
break;

HandleStartupProcInterrupts();

if (CheckForStandbyTrigger())
return;
pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);

/*
 * If recovery pause is requested then set it paused.  While we are in
 * the loop, user might resume and pause again so set this every time.
 */
if (state == RECOVERY_PAUSE_REQUESTED)
SetRecoveryPause(RECOVERY_PAUSED)

And a typo - it's "pg_is_wal_replay_paused" not
"pg_is_wal_repay_paused" +
pg_is_wal_repay_paused() returns whether a
request

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-04 Thread Robert Haas
On Thu, Feb 4, 2021 at 7:46 AM Dilip Kumar  wrote:
> How can we do that this is not a 1 byte flag this is enum so I don't
> think we can read any atomic state without a spin lock here.

I think this discussion of atomics is confused. Let's talk about what
atomic reads and writes mean. Imagine that you have a 64-bit value
0x0101010101010101. Somebody sets it to 0x0202020202020202. Imagine
that just as they are doing that, someone else reads the value and
gets 0x0202020201010101, because half of the value has been updated
and the other half has not yet been updated yet. This kind of thing
can actually happen on some platforms and what it means is that on
those platforms 8-byte reads and writes are not atomic. The idea of an
"atom" is that it can't be split into pieces but these reads and
writes on some platforms are actually not "atomic" because they are
split into two 4-byte pieces. But there's no such thing as a 1-byte
read or write not being atomic. In theory you could imagine a computer
where when you change 0x01 to 0x23 and read in the middle and see 0x21
or 0x13 or something, but no actual computers behave that way, or at
least no mainstream ones that anybody cares about. So the idea that
you somehow need a lock to prevent this is just wrong.

Concurrent programs also suffer from another problem which is
reordering of operations, which can happen either as the program is
compiled or as the program is executed by the CPU. The CPU can see you
set a->x = 1 and a->y = 2 and decide to update y first and then x even
though you wrote it the other way around in the program text. To
prevent this, we have barrier operations; see README.barrier in the
source tree for a longer explanation. Atomic operations like
compare-and-exchange are also full barriers, so that they not only
prevent the torn read/write problem described above, but also enforce
order of operations more strictly.

Now I don't know whether a lock is needed here or not. Maybe it is;
perhaps for consistency with other code, perhaps because the lock
acquire and release is serving the function of a barrier; or perhaps
to guard against some other hazard. But saying that it's because
reading or writing a 1-byte value might not be atomic does not sound
correct.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-04 Thread Dilip Kumar
On Thu, Feb 4, 2021 at 6:16 PM Dilip Kumar  wrote:
>
> On Thu, Feb 4, 2021 at 4:58 PM Bharath Rupireddy
>  wrote:
> >
> > On Thu, Feb 4, 2021 at 10:28 AM Dilip Kumar  wrote:
> > > Please find an updated patch which addresses these comments.
> >
> > Thanks for the patch. I tested the new function 
> > pg_get_wal_replay_pause_state:
> >
> > postgres=# select pg_get_wal_replay_pause_state();
> >  pg_get_wal_replay_pause_state
> > ---
> >  not paused
> > postgres=# select pg_wal_replay_pause();
> >  pg_wal_replay_pause
> > -
> >
> > (1 row)
> >
> > I can also see the "pause requested" state after I put a gdb
> > breakpoint in WaitForWALToBecomeAvailable in the standby startup
> > process .
> >
> > postgres=# select pg_get_wal_replay_pause_state();
> >  pg_get_wal_replay_pause_state
> > ---
> >  pause requested
> > (1 row)
> >
> > postgres=# select pg_get_wal_replay_pause_state();
> >  pg_get_wal_replay_pause_state
> > ---
> >  paused
> > (1 row)
> >
> > Mostly, the v10 patch looks good to me, except below minor comments:
> >
> > 1) A typo in commit message - "just check" --> "just checks"
> >
> > 2) How about
> > +Returns recovery pause state. The return values are not 
> > paused
> > instead of
> > +Returns recovery pause state, the return values are not 
> > paused
> >
> > 3) I think it is 'get wal replay pause state', instead of { oid =>
> > '1137', descr => 'get wal replay is pause state',
> >
> > 4) can we just do this
> > /*
> >  * If recovery pause is requested then set it paused.  While we are 
> > in
> >  * the loop, user might resume and pause again so set this every 
> > time.
> >  */
> > if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState ==
> > RECOVERY_PAUSE_REQUESTED)
> > SetRecoveryPause(RECOVERY_PAUSED);
> > instead of
> > /*
> >  * If recovery pause is requested then set it paused.  While we are 
> > in
> >  * the loop, user might resume and pause again so set this every 
> > time.
> >  */
> > SpinLockAcquire(>info_lck);
> > if (XLogCtl->recoveryPauseState == RECOVERY_PAUSE_REQUESTED)
> > XLogCtl->recoveryPauseState = RECOVERY_PAUSED;
> > SpinLockRelease(>info_lck);
> >
> > I think it's okay, since we take a spinlock anyways in
> > GetRecoveryPauseState(). See the below comment and also a relevant
> > commit 6ba4ecbf477e0b25dd7bde1b0c4e07fc2da19348 on why it's not
> > necessary taking spinlock always:
> > /*
> >  * Pause WAL replay, if requested by a hot-standby session 
> > via
> >  * SetRecoveryPause().
> >  *
> >  * Note that we intentionally don't take the info_lck 
> > spinlock
> >  * here.  We might therefore read a slightly stale value of
> >  * the recoveryPause flag, but it can't be very stale (no
> >  * worse than the last spinlock we did acquire).  Since a
> >  * pause request is a pretty asynchronous thing anyway,
> >  * possibly responding to it one WAL record later than we
> >  * otherwise would is a minor issue, so it doesn't seem 
> > worth
> >  * adding another spinlock cycle to prevent that.
> >  */
>
> How can we do that this is not a 1 byte flag this is enum so I don't
> think we can read any atomic state without a spin lock here.

I have fixed the other comments and the updated patch is attached.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From 0d7059a8809cbd28e7a4668c6b99e0c297c564f3 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Wed, 27 Jan 2021 16:46:04 +0530
Subject: [PATCH v11] Provide a new interface to get the recovery pause status

Currently, pg_is_wal_replay_paused, just checks whether the recovery
pause is requested or not but it doesn't actually tell whether the
recovery is actually paused or not.  So basically there is no way for
the user to know the actual status of the pause request.  This patch
provides a new interface pg_get_wal_replay_pause_state that will
return the actual status of the recovery pause i.e.'not paused' if
pause is not requested, 'pause requested' if pause is requested but
recovery is not yet paused and 'paused' if recovery is actually paused.
---
 doc/src/sgml/func.sgml | 32 ++---
 src/backend/access/transam/xlog.c  | 64 +-
 src/backend/access/transam/xlogfuncs.c | 50 +++---
 src/include/access/xlog.h  | 12 +--
 src/include/catalog/pg_proc.dat|  4 +++
 5 files changed, 134 insertions(+), 28 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f30eaa3..eb131a6 100644
--- a/doc/src/sgml/func.sgml
+++ 

Re: Is Recovery actually paused?

2021-02-04 Thread Dilip Kumar
On Thu, Feb 4, 2021 at 4:58 PM Bharath Rupireddy
 wrote:
>
> On Thu, Feb 4, 2021 at 10:28 AM Dilip Kumar  wrote:
> > Please find an updated patch which addresses these comments.
>
> Thanks for the patch. I tested the new function pg_get_wal_replay_pause_state:
>
> postgres=# select pg_get_wal_replay_pause_state();
>  pg_get_wal_replay_pause_state
> ---
>  not paused
> postgres=# select pg_wal_replay_pause();
>  pg_wal_replay_pause
> -
>
> (1 row)
>
> I can also see the "pause requested" state after I put a gdb
> breakpoint in WaitForWALToBecomeAvailable in the standby startup
> process .
>
> postgres=# select pg_get_wal_replay_pause_state();
>  pg_get_wal_replay_pause_state
> ---
>  pause requested
> (1 row)
>
> postgres=# select pg_get_wal_replay_pause_state();
>  pg_get_wal_replay_pause_state
> ---
>  paused
> (1 row)
>
> Mostly, the v10 patch looks good to me, except below minor comments:
>
> 1) A typo in commit message - "just check" --> "just checks"
>
> 2) How about
> +Returns recovery pause state. The return values are not 
> paused
> instead of
> +Returns recovery pause state, the return values are not 
> paused
>
> 3) I think it is 'get wal replay pause state', instead of { oid =>
> '1137', descr => 'get wal replay is pause state',
>
> 4) can we just do this
> /*
>  * If recovery pause is requested then set it paused.  While we are in
>  * the loop, user might resume and pause again so set this every time.
>  */
> if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState ==
> RECOVERY_PAUSE_REQUESTED)
> SetRecoveryPause(RECOVERY_PAUSED);
> instead of
> /*
>  * If recovery pause is requested then set it paused.  While we are in
>  * the loop, user might resume and pause again so set this every time.
>  */
> SpinLockAcquire(>info_lck);
> if (XLogCtl->recoveryPauseState == RECOVERY_PAUSE_REQUESTED)
> XLogCtl->recoveryPauseState = RECOVERY_PAUSED;
> SpinLockRelease(>info_lck);
>
> I think it's okay, since we take a spinlock anyways in
> GetRecoveryPauseState(). See the below comment and also a relevant
> commit 6ba4ecbf477e0b25dd7bde1b0c4e07fc2da19348 on why it's not
> necessary taking spinlock always:
> /*
>  * Pause WAL replay, if requested by a hot-standby session via
>  * SetRecoveryPause().
>  *
>  * Note that we intentionally don't take the info_lck spinlock
>  * here.  We might therefore read a slightly stale value of
>  * the recoveryPause flag, but it can't be very stale (no
>  * worse than the last spinlock we did acquire).  Since a
>  * pause request is a pretty asynchronous thing anyway,
>  * possibly responding to it one WAL record later than we
>  * otherwise would is a minor issue, so it doesn't seem worth
>  * adding another spinlock cycle to prevent that.
>  */

How can we do that this is not a 1 byte flag this is enum so I don't
think we can read any atomic state without a spin lock here.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-04 Thread Dilip Kumar
On Thu, Feb 4, 2021 at 4:58 PM Bharath Rupireddy
 wrote:
>
> On Thu, Feb 4, 2021 at 10:28 AM Dilip Kumar  wrote:
> > Please find an updated patch which addresses these comments.
>
> Thanks for the patch. I tested the new function pg_get_wal_replay_pause_state:
>
> postgres=# select pg_get_wal_replay_pause_state();
>  pg_get_wal_replay_pause_state
> ---
>  not paused
> postgres=# select pg_wal_replay_pause();
>  pg_wal_replay_pause
> -
>
> (1 row)
>
> I can also see the "pause requested" state after I put a gdb
> breakpoint in WaitForWALToBecomeAvailable in the standby startup
> process .
>
> postgres=# select pg_get_wal_replay_pause_state();
>  pg_get_wal_replay_pause_state
> ---
>  pause requested
> (1 row)
>
> postgres=# select pg_get_wal_replay_pause_state();
>  pg_get_wal_replay_pause_state
> ---
>  paused
> (1 row)
>
> Mostly, the v10 patch looks good to me, except below minor comments:

Thanks for the testing.

> 1) A typo in commit message - "just check" --> "just checks"
>
> 2) How about
> +Returns recovery pause state. The return values are not 
> paused
> instead of
> +Returns recovery pause state, the return values are not 
> paused
>
> 3) I think it is 'get wal replay pause state', instead of { oid =>
> '1137', descr => 'get wal replay is pause state',
>
> 4) can we just do this
> /*
>  * If recovery pause is requested then set it paused.  While we are in
>  * the loop, user might resume and pause again so set this every time.
>  */
> if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState ==
> RECOVERY_PAUSE_REQUESTED)
> SetRecoveryPause(RECOVERY_PAUSED);
> instead of
> /*
>  * If recovery pause is requested then set it paused.  While we are in
>  * the loop, user might resume and pause again so set this every time.
>  */
> SpinLockAcquire(>info_lck);
> if (XLogCtl->recoveryPauseState == RECOVERY_PAUSE_REQUESTED)
> XLogCtl->recoveryPauseState = RECOVERY_PAUSED;
> SpinLockRelease(>info_lck);
>
> I think it's okay, since we take a spinlock anyways in
> GetRecoveryPauseState(). See the below comment and also a relevant
> commit 6ba4ecbf477e0b25dd7bde1b0c4e07fc2da19348 on why it's not
> necessary taking spinlock always:
> /*
>  * Pause WAL replay, if requested by a hot-standby session via
>  * SetRecoveryPause().
>  *
>  * Note that we intentionally don't take the info_lck spinlock
>  * here.  We might therefore read a slightly stale value of
>  * the recoveryPause flag, but it can't be very stale (no
>  * worse than the last spinlock we did acquire).  Since a
>  * pause request is a pretty asynchronous thing anyway,
>  * possibly responding to it one WAL record later than we
>  * otherwise would is a minor issue, so it doesn't seem worth
>  * adding another spinlock cycle to prevent that.
>  */
>

I will work on these comments and send the updated patch soon.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-04 Thread Bharath Rupireddy
On Thu, Feb 4, 2021 at 10:28 AM Dilip Kumar  wrote:
> Please find an updated patch which addresses these comments.

Thanks for the patch. I tested the new function pg_get_wal_replay_pause_state:

postgres=# select pg_get_wal_replay_pause_state();
 pg_get_wal_replay_pause_state
---
 not paused
postgres=# select pg_wal_replay_pause();
 pg_wal_replay_pause
-

(1 row)

I can also see the "pause requested" state after I put a gdb
breakpoint in WaitForWALToBecomeAvailable in the standby startup
process .

postgres=# select pg_get_wal_replay_pause_state();
 pg_get_wal_replay_pause_state
---
 pause requested
(1 row)

postgres=# select pg_get_wal_replay_pause_state();
 pg_get_wal_replay_pause_state
---
 paused
(1 row)

Mostly, the v10 patch looks good to me, except below minor comments:

1) A typo in commit message - "just check" --> "just checks"

2) How about
+Returns recovery pause state. The return values are not paused
instead of
+Returns recovery pause state, the return values are not paused

3) I think it is 'get wal replay pause state', instead of { oid =>
'1137', descr => 'get wal replay is pause state',

4) can we just do this
/*
 * If recovery pause is requested then set it paused.  While we are in
 * the loop, user might resume and pause again so set this every time.
 */
if (((volatile XLogCtlData *) XLogCtl)->recoveryPauseState ==
RECOVERY_PAUSE_REQUESTED)
SetRecoveryPause(RECOVERY_PAUSED);
instead of
/*
 * If recovery pause is requested then set it paused.  While we are in
 * the loop, user might resume and pause again so set this every time.
 */
SpinLockAcquire(>info_lck);
if (XLogCtl->recoveryPauseState == RECOVERY_PAUSE_REQUESTED)
XLogCtl->recoveryPauseState = RECOVERY_PAUSED;
SpinLockRelease(>info_lck);

I think it's okay, since we take a spinlock anyways in
GetRecoveryPauseState(). See the below comment and also a relevant
commit 6ba4ecbf477e0b25dd7bde1b0c4e07fc2da19348 on why it's not
necessary taking spinlock always:
/*
 * Pause WAL replay, if requested by a hot-standby session via
 * SetRecoveryPause().
 *
 * Note that we intentionally don't take the info_lck spinlock
 * here.  We might therefore read a slightly stale value of
 * the recoveryPause flag, but it can't be very stale (no
 * worse than the last spinlock we did acquire).  Since a
 * pause request is a pretty asynchronous thing anyway,
 * possibly responding to it one WAL record later than we
 * otherwise would is a minor issue, so it doesn't seem worth
 * adding another spinlock cycle to prevent that.
 */

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-02-03 Thread Dilip Kumar
On Mon, Feb 1, 2021 at 1:41 PM Dilip Kumar  wrote:
>
> On Mon, Feb 1, 2021 at 11:59 AM Kyotaro Horiguchi
>  wrote:
> >
> > At Sun, 31 Jan 2021 11:24:30 +0530, Dilip Kumar  
> > wrote in
> > > On Fri, Jan 29, 2021 at 4:33 PM Dilip Kumar  wrote:
> > > >
> > > > On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA  wrote:
> > > > >
> > > > > On Thu, 28 Jan 2021 09:55:42 +0530
> > > > > Dilip Kumar  wrote:
> > > > >
> > > > > > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Wed, 27 Jan 2021 13:29:23 +0530
> > > > > > > > Dilip Kumar  wrote:
> > > > > > > >
> > > > > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
> > > > > > > > > > >  wrote:
> > > > > > > > > > > > +1 to just show the recovery pause state in the output 
> > > > > > > > > > > > of
> > > > > > > > > > > > pg_is_wal_replay_paused. But, should the function name
> > > > > > > > > > > > "pg_is_wal_replay_paused" be something like
> > > > > > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, 
> > > > > > > > > > > > when "is" exists
> > > > > > > > > > > > in a function, I expect a boolean output. Others may 
> > > > > > > > > > > > have better
> > > > > > > > > > > > thoughts.
> > > > > > > > > > >
> > > > > > > > > > > Maybe we should leave the existing function 
> > > > > > > > > > > pg_is_wal_replay_paused()
> > > > > > > > > > > alone and add a new one with the name you suggest that 
> > > > > > > > > > > returns text.
> > > > > > > > > > > That would create less burden for tool authors.
> > > > > > > > > >
> > > > > > > > > > +1
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yeah, we can do that, I will send an updated patch soon.
> > > > > > > >
> > > > > > > > This means pg_is_wal_replay_paused is left without any change 
> > > > > > > > and this
> > > > > > > > returns whether pause is requested or not? If so, it seems good 
> > > > > > > > to modify
> > > > > > > > the documentation of this function in order to note that this 
> > > > > > > > could not
> > > > > > > > return the actual pause state.
> > > > > > >
> > > > > > > Yes, we can say that it will return true if the replay pause is
> > > > > > > requested.  I am changing that in my new patch.
> > > > > >
> > > > > > I have modified the patch, changes
> > > > > >
> > > > > > - I have added a new interface pg_get_wal_replay_pause_state to get
> > > > > > the pause request state
> > > > > > - Now, we are not waiting for the recovery to actually get paused 
> > > > > > so I
> > > > > > think it doesn't make sense to put a lot of checkpoints to check the
> > > > > > pause requested so I have removed that check from the
> > > > > > recoveryApplyDelay but I think it better we still keep that check in
> > > > > > the WaitForWalToBecomeAvailable because it can wait forever before 
> > > > > > the
> > > > > > next wal get available.
> > > > >
> > > > > I think basically the check in WaitForWalToBecomeAvailable is 
> > > > > independent
> > > > > of the feature of pg_get_wal_replay_pause_state, that is, reporting 
> > > > > the
> > > > > actual pause state.  This function could just return 'pause requested'
> > > > > if a pause is requested during waiting for WAL.
> > > > >
> > > > > However, I agree the change to allow recovery to transit the state to
> > > > > 'paused' during WAL waiting because 'paused' has more useful 
> > > > > information
> > > > > for users than 'pause requested'.  Returning 'paused' lets users know
> > > > > clearly that no more WAL are applied until recovery is resumed.  On 
> > > > > the
> > > > > other hand, when 'pause requested' is returned, user can't say whether
> > > > > the next WAL wiill be applied or not from this information.
> > > > >
> > > > > For the same reason, I think it is also useful to call 
> > > > > recoveryPausesHere
> > > > > in recoveryApplyDelay.
> > > >
> > > > IMHO the WaitForWalToBecomeAvailable can wait until the next wal get
> > > > available so it can not be controlled by user so it is good to put a
> > > > check for the recovery pause,  however recoveryApplyDelay wait for the
> > > > apply delay which is configured by user and it is predictable value by
> > > > the user.  I don't have much objection to putting that check in the
> > > > recoveryApplyDelay as well but I feel it is not necessary.  Any other
> > > > thoughts on this?
> > > >
> > > > > In addition, in RecoveryRequiresIntParameter, recovery should get 
> > > > > paused
> > > > > if a parameter value has a problem.  However, 
> > > > > pg_get_wal_replay_pause_state
> > > > > will return 'pause requested' in this case. So, I think, we should 
> > > > > pass
> > > > > 

Re: Is Recovery actually paused?

2021-02-01 Thread Dilip Kumar
On Mon, Feb 1, 2021 at 11:59 AM Kyotaro Horiguchi
 wrote:
>
> At Sun, 31 Jan 2021 11:24:30 +0530, Dilip Kumar  wrote 
> in
> > On Fri, Jan 29, 2021 at 4:33 PM Dilip Kumar  wrote:
> > >
> > > On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA  wrote:
> > > >
> > > > On Thu, 28 Jan 2021 09:55:42 +0530
> > > > Dilip Kumar  wrote:
> > > >
> > > > > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Wed, 27 Jan 2021 13:29:23 +0530
> > > > > > > Dilip Kumar  wrote:
> > > > > > >
> > > > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
> > > > > > > > > >  wrote:
> > > > > > > > > > > +1 to just show the recovery pause state in the output of
> > > > > > > > > > > pg_is_wal_replay_paused. But, should the function name
> > > > > > > > > > > "pg_is_wal_replay_paused" be something like
> > > > > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, 
> > > > > > > > > > > when "is" exists
> > > > > > > > > > > in a function, I expect a boolean output. Others may have 
> > > > > > > > > > > better
> > > > > > > > > > > thoughts.
> > > > > > > > > >
> > > > > > > > > > Maybe we should leave the existing function 
> > > > > > > > > > pg_is_wal_replay_paused()
> > > > > > > > > > alone and add a new one with the name you suggest that 
> > > > > > > > > > returns text.
> > > > > > > > > > That would create less burden for tool authors.
> > > > > > > > >
> > > > > > > > > +1
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yeah, we can do that, I will send an updated patch soon.
> > > > > > >
> > > > > > > This means pg_is_wal_replay_paused is left without any change and 
> > > > > > > this
> > > > > > > returns whether pause is requested or not? If so, it seems good 
> > > > > > > to modify
> > > > > > > the documentation of this function in order to note that this 
> > > > > > > could not
> > > > > > > return the actual pause state.
> > > > > >
> > > > > > Yes, we can say that it will return true if the replay pause is
> > > > > > requested.  I am changing that in my new patch.
> > > > >
> > > > > I have modified the patch, changes
> > > > >
> > > > > - I have added a new interface pg_get_wal_replay_pause_state to get
> > > > > the pause request state
> > > > > - Now, we are not waiting for the recovery to actually get paused so I
> > > > > think it doesn't make sense to put a lot of checkpoints to check the
> > > > > pause requested so I have removed that check from the
> > > > > recoveryApplyDelay but I think it better we still keep that check in
> > > > > the WaitForWalToBecomeAvailable because it can wait forever before the
> > > > > next wal get available.
> > > >
> > > > I think basically the check in WaitForWalToBecomeAvailable is 
> > > > independent
> > > > of the feature of pg_get_wal_replay_pause_state, that is, reporting the
> > > > actual pause state.  This function could just return 'pause requested'
> > > > if a pause is requested during waiting for WAL.
> > > >
> > > > However, I agree the change to allow recovery to transit the state to
> > > > 'paused' during WAL waiting because 'paused' has more useful information
> > > > for users than 'pause requested'.  Returning 'paused' lets users know
> > > > clearly that no more WAL are applied until recovery is resumed.  On the
> > > > other hand, when 'pause requested' is returned, user can't say whether
> > > > the next WAL wiill be applied or not from this information.
> > > >
> > > > For the same reason, I think it is also useful to call 
> > > > recoveryPausesHere
> > > > in recoveryApplyDelay.
> > >
> > > IMHO the WaitForWalToBecomeAvailable can wait until the next wal get
> > > available so it can not be controlled by user so it is good to put a
> > > check for the recovery pause,  however recoveryApplyDelay wait for the
> > > apply delay which is configured by user and it is predictable value by
> > > the user.  I don't have much objection to putting that check in the
> > > recoveryApplyDelay as well but I feel it is not necessary.  Any other
> > > thoughts on this?
> > >
> > > > In addition, in RecoveryRequiresIntParameter, recovery should get paused
> > > > if a parameter value has a problem.  However, 
> > > > pg_get_wal_replay_pause_state
> > > > will return 'pause requested' in this case. So, I think, we should pass
> > > > RECOVERY_PAUSED to SetRecoveryPause() instead of 
> > > > RECOVERY_PAUSE_REQUESTED,
> > > > or call CheckAndSetRecoveryPause() in the loop like 
> > > > recoveryPausesHere().
> > >
> > > Yeah, absolutely right, it must pass RECOVERY_PAUSED.  I will change
> > > this, thanks for noticing this.
> >
> > I have changed this in the new patch.
>
> It seems to work well. The 

Re: Is Recovery actually paused?

2021-01-31 Thread Kyotaro Horiguchi
At Sun, 31 Jan 2021 11:24:30 +0530, Dilip Kumar  wrote 
in 
> On Fri, Jan 29, 2021 at 4:33 PM Dilip Kumar  wrote:
> >
> > On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA  wrote:
> > >
> > > On Thu, 28 Jan 2021 09:55:42 +0530
> > > Dilip Kumar  wrote:
> > >
> > > > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar  
> > > > wrote:
> > > > >
> > > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, 27 Jan 2021 13:29:23 +0530
> > > > > > Dilip Kumar  wrote:
> > > > > >
> > > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
> > > > > > > > >  wrote:
> > > > > > > > > > +1 to just show the recovery pause state in the output of
> > > > > > > > > > pg_is_wal_replay_paused. But, should the function name
> > > > > > > > > > "pg_is_wal_replay_paused" be something like
> > > > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when 
> > > > > > > > > > "is" exists
> > > > > > > > > > in a function, I expect a boolean output. Others may have 
> > > > > > > > > > better
> > > > > > > > > > thoughts.
> > > > > > > > >
> > > > > > > > > Maybe we should leave the existing function 
> > > > > > > > > pg_is_wal_replay_paused()
> > > > > > > > > alone and add a new one with the name you suggest that 
> > > > > > > > > returns text.
> > > > > > > > > That would create less burden for tool authors.
> > > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > >
> > > > > > > Yeah, we can do that, I will send an updated patch soon.
> > > > > >
> > > > > > This means pg_is_wal_replay_paused is left without any change and 
> > > > > > this
> > > > > > returns whether pause is requested or not? If so, it seems good to 
> > > > > > modify
> > > > > > the documentation of this function in order to note that this could 
> > > > > > not
> > > > > > return the actual pause state.
> > > > >
> > > > > Yes, we can say that it will return true if the replay pause is
> > > > > requested.  I am changing that in my new patch.
> > > >
> > > > I have modified the patch, changes
> > > >
> > > > - I have added a new interface pg_get_wal_replay_pause_state to get
> > > > the pause request state
> > > > - Now, we are not waiting for the recovery to actually get paused so I
> > > > think it doesn't make sense to put a lot of checkpoints to check the
> > > > pause requested so I have removed that check from the
> > > > recoveryApplyDelay but I think it better we still keep that check in
> > > > the WaitForWalToBecomeAvailable because it can wait forever before the
> > > > next wal get available.
> > >
> > > I think basically the check in WaitForWalToBecomeAvailable is independent
> > > of the feature of pg_get_wal_replay_pause_state, that is, reporting the
> > > actual pause state.  This function could just return 'pause requested'
> > > if a pause is requested during waiting for WAL.
> > >
> > > However, I agree the change to allow recovery to transit the state to
> > > 'paused' during WAL waiting because 'paused' has more useful information
> > > for users than 'pause requested'.  Returning 'paused' lets users know
> > > clearly that no more WAL are applied until recovery is resumed.  On the
> > > other hand, when 'pause requested' is returned, user can't say whether
> > > the next WAL wiill be applied or not from this information.
> > >
> > > For the same reason, I think it is also useful to call recoveryPausesHere
> > > in recoveryApplyDelay.
> >
> > IMHO the WaitForWalToBecomeAvailable can wait until the next wal get
> > available so it can not be controlled by user so it is good to put a
> > check for the recovery pause,  however recoveryApplyDelay wait for the
> > apply delay which is configured by user and it is predictable value by
> > the user.  I don't have much objection to putting that check in the
> > recoveryApplyDelay as well but I feel it is not necessary.  Any other
> > thoughts on this?
> >
> > > In addition, in RecoveryRequiresIntParameter, recovery should get paused
> > > if a parameter value has a problem.  However, 
> > > pg_get_wal_replay_pause_state
> > > will return 'pause requested' in this case. So, I think, we should pass
> > > RECOVERY_PAUSED to SetRecoveryPause() instead of RECOVERY_PAUSE_REQUESTED,
> > > or call CheckAndSetRecoveryPause() in the loop like recoveryPausesHere().
> >
> > Yeah, absolutely right, it must pass RECOVERY_PAUSED.  I will change
> > this, thanks for noticing this.
> 
> I have changed this in the new patch.

It seems to work well. The checkpoints seems to be placed properly.

+SetRecoveryPause(RecoveryPauseState state)
 {
+   Assert(state >= RECOVERY_NOT_PAUSED && state <= RECOVERY_PAUSED);

I'm not sure that state worth FATAL.  Isn't it enough to just ERROR
out like XLogFileRead?

CheckAndSetRecovery() has only one caller.  

Re: Is Recovery actually paused?

2021-01-30 Thread Dilip Kumar
On Fri, Jan 29, 2021 at 4:33 PM Dilip Kumar  wrote:
>
> On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA  wrote:
> >
> > On Thu, 28 Jan 2021 09:55:42 +0530
> > Dilip Kumar  wrote:
> >
> > > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar  wrote:
> > > >
> > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA  wrote:
> > > > >
> > > > > On Wed, 27 Jan 2021 13:29:23 +0530
> > > > > Dilip Kumar  wrote:
> > > > >
> > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
> > > > > > > >  wrote:
> > > > > > > > > +1 to just show the recovery pause state in the output of
> > > > > > > > > pg_is_wal_replay_paused. But, should the function name
> > > > > > > > > "pg_is_wal_replay_paused" be something like
> > > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when 
> > > > > > > > > "is" exists
> > > > > > > > > in a function, I expect a boolean output. Others may have 
> > > > > > > > > better
> > > > > > > > > thoughts.
> > > > > > > >
> > > > > > > > Maybe we should leave the existing function 
> > > > > > > > pg_is_wal_replay_paused()
> > > > > > > > alone and add a new one with the name you suggest that returns 
> > > > > > > > text.
> > > > > > > > That would create less burden for tool authors.
> > > > > > >
> > > > > > > +1
> > > > > > >
> > > > > >
> > > > > > Yeah, we can do that, I will send an updated patch soon.
> > > > >
> > > > > This means pg_is_wal_replay_paused is left without any change and this
> > > > > returns whether pause is requested or not? If so, it seems good to 
> > > > > modify
> > > > > the documentation of this function in order to note that this could 
> > > > > not
> > > > > return the actual pause state.
> > > >
> > > > Yes, we can say that it will return true if the replay pause is
> > > > requested.  I am changing that in my new patch.
> > >
> > > I have modified the patch, changes
> > >
> > > - I have added a new interface pg_get_wal_replay_pause_state to get
> > > the pause request state
> > > - Now, we are not waiting for the recovery to actually get paused so I
> > > think it doesn't make sense to put a lot of checkpoints to check the
> > > pause requested so I have removed that check from the
> > > recoveryApplyDelay but I think it better we still keep that check in
> > > the WaitForWalToBecomeAvailable because it can wait forever before the
> > > next wal get available.
> >
> > I think basically the check in WaitForWalToBecomeAvailable is independent
> > of the feature of pg_get_wal_replay_pause_state, that is, reporting the
> > actual pause state.  This function could just return 'pause requested'
> > if a pause is requested during waiting for WAL.
> >
> > However, I agree the change to allow recovery to transit the state to
> > 'paused' during WAL waiting because 'paused' has more useful information
> > for users than 'pause requested'.  Returning 'paused' lets users know
> > clearly that no more WAL are applied until recovery is resumed.  On the
> > other hand, when 'pause requested' is returned, user can't say whether
> > the next WAL wiill be applied or not from this information.
> >
> > For the same reason, I think it is also useful to call recoveryPausesHere
> > in recoveryApplyDelay.
>
> IMHO the WaitForWalToBecomeAvailable can wait until the next wal get
> available so it can not be controlled by user so it is good to put a
> check for the recovery pause,  however recoveryApplyDelay wait for the
> apply delay which is configured by user and it is predictable value by
> the user.  I don't have much objection to putting that check in the
> recoveryApplyDelay as well but I feel it is not necessary.  Any other
> thoughts on this?
>
> > In addition, in RecoveryRequiresIntParameter, recovery should get paused
> > if a parameter value has a problem.  However, pg_get_wal_replay_pause_state
> > will return 'pause requested' in this case. So, I think, we should pass
> > RECOVERY_PAUSED to SetRecoveryPause() instead of RECOVERY_PAUSE_REQUESTED,
> > or call CheckAndSetRecoveryPause() in the loop like recoveryPausesHere().
>
> Yeah, absolutely right, it must pass RECOVERY_PAUSED.  I will change
> this, thanks for noticing this.

I have changed this in the new patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v9-0001-Provide-a-new-interface-to-get-the-recovery-pause.patch
Description: Binary data


Re: Is Recovery actually paused?

2021-01-29 Thread Yugo NAGATA
On Fri, 29 Jan 2021 16:33:32 +0530
Dilip Kumar  wrote:

> On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA  wrote:
> >
> > On Thu, 28 Jan 2021 09:55:42 +0530
> > Dilip Kumar  wrote:
> >
> > > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar  wrote:
> > > >
> > > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA  wrote:
> > > > >
> > > > > On Wed, 27 Jan 2021 13:29:23 +0530
> > > > > Dilip Kumar  wrote:
> > > > >
> > > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
> > > > > > > >  wrote:
> > > > > > > > > +1 to just show the recovery pause state in the output of
> > > > > > > > > pg_is_wal_replay_paused. But, should the function name
> > > > > > > > > "pg_is_wal_replay_paused" be something like
> > > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when 
> > > > > > > > > "is" exists
> > > > > > > > > in a function, I expect a boolean output. Others may have 
> > > > > > > > > better
> > > > > > > > > thoughts.
> > > > > > > >
> > > > > > > > Maybe we should leave the existing function 
> > > > > > > > pg_is_wal_replay_paused()
> > > > > > > > alone and add a new one with the name you suggest that returns 
> > > > > > > > text.
> > > > > > > > That would create less burden for tool authors.
> > > > > > >
> > > > > > > +1
> > > > > > >
> > > > > >
> > > > > > Yeah, we can do that, I will send an updated patch soon.
> > > > >
> > > > > This means pg_is_wal_replay_paused is left without any change and this
> > > > > returns whether pause is requested or not? If so, it seems good to 
> > > > > modify
> > > > > the documentation of this function in order to note that this could 
> > > > > not
> > > > > return the actual pause state.
> > > >
> > > > Yes, we can say that it will return true if the replay pause is
> > > > requested.  I am changing that in my new patch.
> > >
> > > I have modified the patch, changes
> > >
> > > - I have added a new interface pg_get_wal_replay_pause_state to get
> > > the pause request state
> > > - Now, we are not waiting for the recovery to actually get paused so I
> > > think it doesn't make sense to put a lot of checkpoints to check the
> > > pause requested so I have removed that check from the
> > > recoveryApplyDelay but I think it better we still keep that check in
> > > the WaitForWalToBecomeAvailable because it can wait forever before the
> > > next wal get available.
> >
> > I think basically the check in WaitForWalToBecomeAvailable is independent
> > of the feature of pg_get_wal_replay_pause_state, that is, reporting the
> > actual pause state.  This function could just return 'pause requested'
> > if a pause is requested during waiting for WAL.
> >
> > However, I agree the change to allow recovery to transit the state to
> > 'paused' during WAL waiting because 'paused' has more useful information
> > for users than 'pause requested'.  Returning 'paused' lets users know
> > clearly that no more WAL are applied until recovery is resumed.  On the
> > other hand, when 'pause requested' is returned, user can't say whether
> > the next WAL wiill be applied or not from this information.
> >
> > For the same reason, I think it is also useful to call recoveryPausesHere
> > in recoveryApplyDelay.
> 
> IMHO the WaitForWalToBecomeAvailable can wait until the next wal get
> available so it can not be controlled by user so it is good to put a
> check for the recovery pause,  however recoveryApplyDelay wait for the
> apply delay which is configured by user and it is predictable value by
> the user.  I don't have much objection to putting that check in the
> recoveryApplyDelay as well but I feel it is not necessary.  Any other
> thoughts on this?

I'm not sure if the user can figure out easily that the reason why
pg_get_wal_replay_pause_state returns 'pause requested' is due to
recovery_min_apply_delay because it would needs knowledge of the
internal mechanism of recovery.  However, if there are not any other
opinions of it, I don't care that recoveryApplyDelay is left as is
because such check and state transition is independent of the goal of
pg_get_wal_replay_pause_state itself as I mentioned above.

Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: Is Recovery actually paused?

2021-01-29 Thread Dilip Kumar
On Fri, Jan 29, 2021 at 3:25 PM Yugo NAGATA  wrote:
>
> On Thu, 28 Jan 2021 09:55:42 +0530
> Dilip Kumar  wrote:
>
> > On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar  wrote:
> > >
> > > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA  wrote:
> > > >
> > > > On Wed, 27 Jan 2021 13:29:23 +0530
> > > > Dilip Kumar  wrote:
> > > >
> > > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada 
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
> > > > > > >  wrote:
> > > > > > > > +1 to just show the recovery pause state in the output of
> > > > > > > > pg_is_wal_replay_paused. But, should the function name
> > > > > > > > "pg_is_wal_replay_paused" be something like
> > > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" 
> > > > > > > > exists
> > > > > > > > in a function, I expect a boolean output. Others may have better
> > > > > > > > thoughts.
> > > > > > >
> > > > > > > Maybe we should leave the existing function 
> > > > > > > pg_is_wal_replay_paused()
> > > > > > > alone and add a new one with the name you suggest that returns 
> > > > > > > text.
> > > > > > > That would create less burden for tool authors.
> > > > > >
> > > > > > +1
> > > > > >
> > > > >
> > > > > Yeah, we can do that, I will send an updated patch soon.
> > > >
> > > > This means pg_is_wal_replay_paused is left without any change and this
> > > > returns whether pause is requested or not? If so, it seems good to 
> > > > modify
> > > > the documentation of this function in order to note that this could not
> > > > return the actual pause state.
> > >
> > > Yes, we can say that it will return true if the replay pause is
> > > requested.  I am changing that in my new patch.
> >
> > I have modified the patch, changes
> >
> > - I have added a new interface pg_get_wal_replay_pause_state to get
> > the pause request state
> > - Now, we are not waiting for the recovery to actually get paused so I
> > think it doesn't make sense to put a lot of checkpoints to check the
> > pause requested so I have removed that check from the
> > recoveryApplyDelay but I think it better we still keep that check in
> > the WaitForWalToBecomeAvailable because it can wait forever before the
> > next wal get available.
>
> I think basically the check in WaitForWalToBecomeAvailable is independent
> of the feature of pg_get_wal_replay_pause_state, that is, reporting the
> actual pause state.  This function could just return 'pause requested'
> if a pause is requested during waiting for WAL.
>
> However, I agree the change to allow recovery to transit the state to
> 'paused' during WAL waiting because 'paused' has more useful information
> for users than 'pause requested'.  Returning 'paused' lets users know
> clearly that no more WAL are applied until recovery is resumed.  On the
> other hand, when 'pause requested' is returned, user can't say whether
> the next WAL wiill be applied or not from this information.
>
> For the same reason, I think it is also useful to call recoveryPausesHere
> in recoveryApplyDelay.

IMHO the WaitForWalToBecomeAvailable can wait until the next wal get
available so it can not be controlled by user so it is good to put a
check for the recovery pause,  however recoveryApplyDelay wait for the
apply delay which is configured by user and it is predictable value by
the user.  I don't have much objection to putting that check in the
recoveryApplyDelay as well but I feel it is not necessary.  Any other
thoughts on this?

> In addition, in RecoveryRequiresIntParameter, recovery should get paused
> if a parameter value has a problem.  However, pg_get_wal_replay_pause_state
> will return 'pause requested' in this case. So, I think, we should pass
> RECOVERY_PAUSED to SetRecoveryPause() instead of RECOVERY_PAUSE_REQUESTED,
> or call CheckAndSetRecoveryPause() in the loop like recoveryPausesHere().

Yeah, absolutely right, it must pass RECOVERY_PAUSED.  I will change
this, thanks for noticing this.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-01-29 Thread Yugo NAGATA
On Thu, 28 Jan 2021 09:55:42 +0530
Dilip Kumar  wrote:

> On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar  wrote:
> >
> > On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA  wrote:
> > >
> > > On Wed, 27 Jan 2021 13:29:23 +0530
> > > Dilip Kumar  wrote:
> > >
> > > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada 
> > > >  wrote:
> > > > >
> > > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas  
> > > > > wrote:
> > > > > >
> > > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
> > > > > >  wrote:
> > > > > > > +1 to just show the recovery pause state in the output of
> > > > > > > pg_is_wal_replay_paused. But, should the function name
> > > > > > > "pg_is_wal_replay_paused" be something like
> > > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" 
> > > > > > > exists
> > > > > > > in a function, I expect a boolean output. Others may have better
> > > > > > > thoughts.
> > > > > >
> > > > > > Maybe we should leave the existing function 
> > > > > > pg_is_wal_replay_paused()
> > > > > > alone and add a new one with the name you suggest that returns text.
> > > > > > That would create less burden for tool authors.
> > > > >
> > > > > +1
> > > > >
> > > >
> > > > Yeah, we can do that, I will send an updated patch soon.
> > >
> > > This means pg_is_wal_replay_paused is left without any change and this
> > > returns whether pause is requested or not? If so, it seems good to modify
> > > the documentation of this function in order to note that this could not
> > > return the actual pause state.
> >
> > Yes, we can say that it will return true if the replay pause is
> > requested.  I am changing that in my new patch.
> 
> I have modified the patch, changes
> 
> - I have added a new interface pg_get_wal_replay_pause_state to get
> the pause request state
> - Now, we are not waiting for the recovery to actually get paused so I
> think it doesn't make sense to put a lot of checkpoints to check the
> pause requested so I have removed that check from the
> recoveryApplyDelay but I think it better we still keep that check in
> the WaitForWalToBecomeAvailable because it can wait forever before the
> next wal get available.

I think basically the check in WaitForWalToBecomeAvailable is independent
of the feature of pg_get_wal_replay_pause_state, that is, reporting the
actual pause state.  This function could just return 'pause requested' 
if a pause is requested during waiting for WAL.

However, I agree the change to allow recovery to transit the state to
'paused' during WAL waiting because 'paused' has more useful information
for users than 'pause requested'.  Returning 'paused' lets users know
clearly that no more WAL are applied until recovery is resumed.  On the
other hand, when 'pause requested' is returned, user can't say whether
the next WAL wiill be applied or not from this information.

For the same reason, I think it is also useful to call recoveryPausesHere
in recoveryApplyDelay. 

In addition, in RecoveryRequiresIntParameter, recovery should get paused
if a parameter value has a problem.  However, pg_get_wal_replay_pause_state
will return 'pause requested' in this case. So, I think, we should pass
RECOVERY_PAUSED to SetRecoveryPause() instead of RECOVERY_PAUSE_REQUESTED,
or call CheckAndSetRecoveryPause() in the loop like recoveryPausesHere().

Regrads,
Yugo Nagata

-- 
Yugo NAGATA 




Re: Is Recovery actually paused?

2021-01-27 Thread Dilip Kumar
On Wed, Jan 27, 2021 at 2:28 PM Dilip Kumar  wrote:
>
> On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA  wrote:
> >
> > On Wed, 27 Jan 2021 13:29:23 +0530
> > Dilip Kumar  wrote:
> >
> > > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas  
> > > > wrote:
> > > > >
> > > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
> > > > >  wrote:
> > > > > > +1 to just show the recovery pause state in the output of
> > > > > > pg_is_wal_replay_paused. But, should the function name
> > > > > > "pg_is_wal_replay_paused" be something like
> > > > > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" 
> > > > > > exists
> > > > > > in a function, I expect a boolean output. Others may have better
> > > > > > thoughts.
> > > > >
> > > > > Maybe we should leave the existing function pg_is_wal_replay_paused()
> > > > > alone and add a new one with the name you suggest that returns text.
> > > > > That would create less burden for tool authors.
> > > >
> > > > +1
> > > >
> > >
> > > Yeah, we can do that, I will send an updated patch soon.
> >
> > This means pg_is_wal_replay_paused is left without any change and this
> > returns whether pause is requested or not? If so, it seems good to modify
> > the documentation of this function in order to note that this could not
> > return the actual pause state.
>
> Yes, we can say that it will return true if the replay pause is
> requested.  I am changing that in my new patch.

I have modified the patch, changes

- I have added a new interface pg_get_wal_replay_pause_state to get
the pause request state
- Now, we are not waiting for the recovery to actually get paused so I
think it doesn't make sense to put a lot of checkpoints to check the
pause requested so I have removed that check from the
recoveryApplyDelay but I think it better we still keep that check in
the WaitForWalToBecomeAvailable because it can wait forever before the
next wal get available.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v8-0001-Provide-a-new-interface-to-get-the-recovery-pause.patch
Description: Binary data


Re: Is Recovery actually paused?

2021-01-27 Thread Dilip Kumar
On Wed, Jan 27, 2021 at 2:06 PM Yugo NAGATA  wrote:
>
> On Wed, 27 Jan 2021 13:29:23 +0530
> Dilip Kumar  wrote:
>
> > On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas  wrote:
> > > >
> > > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
> > > >  wrote:
> > > > > +1 to just show the recovery pause state in the output of
> > > > > pg_is_wal_replay_paused. But, should the function name
> > > > > "pg_is_wal_replay_paused" be something like
> > > > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
> > > > > in a function, I expect a boolean output. Others may have better
> > > > > thoughts.
> > > >
> > > > Maybe we should leave the existing function pg_is_wal_replay_paused()
> > > > alone and add a new one with the name you suggest that returns text.
> > > > That would create less burden for tool authors.
> > >
> > > +1
> > >
> >
> > Yeah, we can do that, I will send an updated patch soon.
>
> This means pg_is_wal_replay_paused is left without any change and this
> returns whether pause is requested or not? If so, it seems good to modify
> the documentation of this function in order to note that this could not
> return the actual pause state.

Yes, we can say that it will return true if the replay pause is
requested.  I am changing that in my new patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-01-27 Thread Yugo NAGATA
On Wed, 27 Jan 2021 13:29:23 +0530
Dilip Kumar  wrote:

> On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada  
> wrote:
> >
> > On Tue, Jan 26, 2021 at 2:00 AM Robert Haas  wrote:
> > >
> > > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
> > >  wrote:
> > > > +1 to just show the recovery pause state in the output of
> > > > pg_is_wal_replay_paused. But, should the function name
> > > > "pg_is_wal_replay_paused" be something like
> > > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
> > > > in a function, I expect a boolean output. Others may have better
> > > > thoughts.
> > >
> > > Maybe we should leave the existing function pg_is_wal_replay_paused()
> > > alone and add a new one with the name you suggest that returns text.
> > > That would create less burden for tool authors.
> >
> > +1
> >
> 
> Yeah, we can do that, I will send an updated patch soon.

This means pg_is_wal_replay_paused is left without any change and this
returns whether pause is requested or not? If so, it seems good to modify
the documentation of this function in order to note that this could not
return the actual pause state.

Regards,
Yugo Nagata


-- 
Yugo NAGATA 




Re: Is Recovery actually paused?

2021-01-26 Thread Dilip Kumar
On Wed, Jan 27, 2021 at 12:50 PM Masahiko Sawada  wrote:
>
> On Tue, Jan 26, 2021 at 2:00 AM Robert Haas  wrote:
> >
> > On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
> >  wrote:
> > > +1 to just show the recovery pause state in the output of
> > > pg_is_wal_replay_paused. But, should the function name
> > > "pg_is_wal_replay_paused" be something like
> > > "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
> > > in a function, I expect a boolean output. Others may have better
> > > thoughts.
> >
> > Maybe we should leave the existing function pg_is_wal_replay_paused()
> > alone and add a new one with the name you suggest that returns text.
> > That would create less burden for tool authors.
>
> +1
>

Yeah, we can do that, I will send an updated patch soon.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-01-26 Thread Masahiko Sawada
On Tue, Jan 26, 2021 at 2:00 AM Robert Haas  wrote:
>
> On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
>  wrote:
> > +1 to just show the recovery pause state in the output of
> > pg_is_wal_replay_paused. But, should the function name
> > "pg_is_wal_replay_paused" be something like
> > "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
> > in a function, I expect a boolean output. Others may have better
> > thoughts.
>
> Maybe we should leave the existing function pg_is_wal_replay_paused()
> alone and add a new one with the name you suggest that returns text.
> That would create less burden for tool authors.

+1

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Is Recovery actually paused?

2021-01-25 Thread Robert Haas
On Sat, Jan 23, 2021 at 6:10 AM Bharath Rupireddy
 wrote:
> +1 to just show the recovery pause state in the output of
> pg_is_wal_replay_paused. But, should the function name
> "pg_is_wal_replay_paused" be something like
> "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
> in a function, I expect a boolean output. Others may have better
> thoughts.

Maybe we should leave the existing function pg_is_wal_replay_paused()
alone and add a new one with the name you suggest that returns text.
That would create less burden for tool authors.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-01-25 Thread Yugo NAGATA
On Mon, 25 Jan 2021 14:53:18 +0530
Dilip Kumar  wrote:
 
> I have changed as per other functions for consistency.

Thank you for updating the patch. Here are a few comments:


(1)

-   SetRecoveryPause(true);
+   SetRecoveryPause(RECOVERY_PAUSE_REQUESTED);
 
ereport(LOG
(errmsg("recovery has paused"),
 errdetail("If recovery is unpaused, 
the server will shut down."),
 errhint("You can then restart the 
server after making the necessary configuration changes.")));
 
-   while (RecoveryIsPaused())
+   while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)
{
HandleStartupProcInterrupts();

This fix would be required for code added by the following commit. 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=15251c0a60be76eedee74ac0e94b433f9acca5af

Due to this, the recovery could get paused after the configuration
change in the primary. However, after applying this patch,
pg_is_wal_replay_paused  returns "pause requested" although it should
return "paused".

To fix this, we must pass RECOVERY_PAUSED to SetRecoveryPause() instead
of RECOVERY_PAUSE_REQUESTED. Or, we can call CheckAndSetRecoveryPause()
in the loop like recoveryPausesHere(), but this seems redundant.


(2)
-   while (RecoveryIsPaused())
+   while (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)
{
+
HandleStartupProcInterrupts();

Though it is trivial, I think the new line after the brace is unnecessary.


Regards,
Yugo Nagata

-- 
Yugo NAGATA 




Re: Is Recovery actually paused?

2021-01-25 Thread Bharath Rupireddy
On Mon, Jan 25, 2021 at 2:53 PM Dilip Kumar  wrote:
> I have changed as per other functions for consistency.

Thanks for the v7 patch. Here are some quick comments on it:

[1] I think we need to change return value from boolean to text in
documentation:
  pg_is_wal_replay_paused

pg_is_wal_replay_paused ()
boolean
   

[2] Do we intentionally ignore the return type of below function? If
yes, can we change the return type to void and change the function
comment? If we do care about the return value, shouldn't we use it?

static bool recoveryApplyDelay(XLogReaderState *record);
+recoveryApplyDelay(xlogreader);

[3] Although it's not necessary, I just thought, it will be good to
have an example for the new output of pg_is_wal_replay_paused in the
documentation, something like below for brin_page_type:


test=# SELECT brin_page_type(get_raw_page('brinidx', 0));
 brin_page_type

 meta


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-01-25 Thread Dilip Kumar
On Sun, Jan 24, 2021 at 12:17 PM Bharath Rupireddy
 wrote:
>
> On Sun, Jan 24, 2021 at 11:29 AM Dilip Kumar  wrote:
> > > Some comments on the v6 patch:
>
> > > [2] Typo - it's "requested" + * 'paused requested' - if pause is
> > > reqested but recovery is not yet paused
>
> Here I meant the typo "reqested" in "if pause is reqested but recovery
> is not yet paused" statement from v6 patch.

Ok

> > > [3] I think it's "+ * 'pause requested'" instead of "+ * 'paused 
> > > requested'"
> >
> > Which code does it refer, can give put the snippet from the patch.
> > However, I have found there were 'paused requested' in two places so I
> > have fixed.
>
> Thanks.
>
> > > [6] As I mentioned upthread, isn't it better to have
> > > "IsRecoveryPaused(void)" than "RecoveryIsPaused(void)"?
> >
> > That is an existing function so I think it's fine to keep the same name.
>
> Personally, I think the function RecoveryIsPaused itself is
> unnecessary with the new function GetRecoveryPauseState introduced in
> your patch. IMHO, we can remove it. If not okay, then we are at it,
> can we at least change the function name to be meaningful
> "IsRecoveryPaused"? Others may have better thoughts than me.

I have removed this function

> > > [7] Can we have the function variable name "recoveryPause" as "state"
> > > or "pauseState? Because that variable context is set by the enum name
> > > RecoveryPauseState and the function name.
> > >
> > > +SetRecoveryPause(RecoveryPauseState recoveryPause)
> > >
> > > Here as well, "recoveryPauseState" to "state"?
> > > +GetRecoveryPauseState(void)
> > >  {
> > > -boolrecoveryPause;
> > > +RecoveryPauseStaterecoveryPauseState;
> >
> > I don't think it is required but while changing the patch I will see
> > whether to change or not.
>
> It will be good to change that. I personally don't like structure
> names and variable names to be the same.

Changed to state

> > > [6] Function name RecoveryIsPaused and it's comment "Check whether the
> > > recovery pause is requested." doesn't seem to be matching. Seems like
> > > it returns true even when RECOVERY_PAUSE_REQUESTED or RECOVERY_PAUSED.
> > > Should it return true only when the state is RECOVERY_PAUSE_REQUESTED?
> >
> > Code is doing right, I will change the comments.
> >
> > > Instead of "while (RecoveryIsPaused())", can't we change it to "while
> > > (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)" and remove the
> > > RecoveryIsPaused()?
> >
> > I think it looks clean with the function
>
> As I said earlier, I see no use of RecoveryIsPaused() with the
> introduction of the new function GetRecoveryPauseState(). Others may
> have better thoughts than me.
>
> > > [7] Can we change the switch-case in pg_is_wal_replay_paused to
> > > something like below?
> > >
> > > Datum
> > > pg_is_wal_replay_paused(PG_FUNCTION_ARGS)
> > > {
> > > +char   *state;
> > > +/* get the recovery pause state */
> > > +switch(GetRecoveryPauseState())
> > > +{
> > > +case RECOVERY_NOT_PAUSED:
> > > +state = "not paused";
> > > +case RECOVERY_PAUSE_REQUESTED:
> > > +state = "paused requested";
> > > +case RECOVERY_PAUSED:
> > > +state = "paused";
> > > +default:
> > > +elog(ERROR, "invalid recovery pause state");
> > > +}
> > > +
> > > +PG_RETURN_TEXT_P(cstring_to_text(type));
> >
> > Why do you think it is better to use an extra variable?
>
> I see no wrong in having PG_RETURN_TEXT_P and cstring_to_text 's in
> every case statement. But, just to make sure the code looks cleaner, I
> said that we can have a local state variable and just one
> PG_RETURN_TEXT_P(cstring_to_text(state));. See some existing functions
> brin_page_type, hash_page_type, json_typeof,
> pg_stat_get_backend_activity, pg_stat_get_backend_wait_event_type,
> pg_stat_get_backend_wait_event, get_command_type.
>

I have changed as per other functions for consistency.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From 5a42c06f866f4e52e136ac46092b3e7f75e9c3ec Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Sat, 23 Jan 2021 11:59:58 +0530
Subject: [PATCH v7] pg_is_wal_replay_paused will return the status of recovery
 pause

Currently, pg_is_wal_replay_paused, just check whether the recovery
pause is requested or not but it doesn't actually tell whether the
recovery is actually paused or not.  So basically there is no way for
the user to know the actual status of the pause request.  With this
patch the API will return the actual status of the recovery pause i.e.
'not paused' if pause is not requested
'pause requested' if pause is requested but recovery is not yet paused
and 'paused' if recovery is actually paused.
---
 doc/src/sgml/func.sgml | 13 +++--
 src/backend/access/transam/xlog.c  | 88 +++---
 src/backend/access/transam/xlogfuncs.c | 35 --
 src/include/access/xlog.h  | 12 -

Re: Is Recovery actually paused?

2021-01-24 Thread Kyotaro Horiguchi
At Mon, 25 Jan 2021 10:05:19 +0530, Dilip Kumar  wrote 
in 
> On Mon, Jan 25, 2021 at 8:42 AM Kyotaro Horiguchi
>  wrote:
> >
> > At Sun, 24 Jan 2021 14:26:08 +0530, Dilip Kumar  
> > wrote in
> > > On Sun, Jan 24, 2021 at 12:16 PM Bharath Rupireddy
> > >  wrote:
> > > >
> > > > On Sun, Jan 24, 2021 at 7:17 AM Dilip Kumar  
> > > > wrote:
> > > > > On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy 
> > > > >  wrote:
> > > > >>
> > > > >> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar  
> > > > >> wrote:
> > > > >> > Please find the patch for the same.  I haven't added a test case 
> > > > >> > for
> > > > >> > this yet.  I mean we can write a test case to pause the recovery 
> > > > >> > and
> > > > >> > get the status.  But I am not sure that we can really write a 
> > > > >> > reliable
> > > > >> > test case for 'pause requested' and 'paused'.
> > > > >>
> > > > >> +1 to just show the recovery pause state in the output of
> > > > >> pg_is_wal_replay_paused. But, should the function name
> > > > >> "pg_is_wal_replay_paused" be something like
> > > > >> "pg_get_wal_replay_pause_state" or some other? To me, when "is" 
> > > > >> exists
> > > > >> in a function, I expect a boolean output. Others may have better
> > > > >> thoughts.
> > > > >>
> > > > >> IIUC the above change, ensuring the recovery is paused after it's
> > > > >> requested lies with the user. IMHO, the main problem we are trying to
> > > > >> solve is not met
> > > > >
> > > > >
> > > > > Basically earlier their was no way for the user yo know whether the 
> > > > > recovery is actually paused or not because it was always returning 
> > > > > true after pause requested.  Now, we will return whether pause 
> > > > > requested or actually paused.  So > for tool designer who want to 
> > > > > wait for recovery to get paused can have a loop and wait until the 
> > > > > recovery state reaches to paused.  That will give a better control.
> > > >
> > > > I get it and I agree to have that change. My point was whether we can
> > > > have a new function pg_wal_replay_pause_and_wait that waits until
> > > > recovery is actually paused ((along with pg_is_wal_replay_paused
> > > > returning the actual state than a true/false) so that tool developers
> > > > don't need to have the waiting code outside, if at all they care about
> > > > it? Others may have better thoughts than me.
> > >
> > > I think the previous patch was based on that idea where we thought
> > > that we can pass an argument to pg_is_wal_replay_paused which can
> > > decide whether to wait or return without the wait.  I think this
> > > version looks better to me where we give the status instead of
> > > waiting.  I am not sure whether we want another version of
> > > pg_wal_replay_pause which will wait for actually it to get paused.  I
> > > mean there is always a scope to include the functionality in the
> > > database which can be achieved by the tool, but this patch was trying
> > > to solve the problem that there was no way to know the status so I
> > > think returning the correct status should be the scope of this.
> >
> > I understand that the requirement here is that no record is replayed
> > after pg_wal_replay_pause() is returned, or pg_is_wal_replay_paused()
> > returns true, and delays taken while recovery don't delay the state
> > change. That requirements are really synchronous.
> >
> > On the other hand the machinery is designed to be asynchronous.
> >
> > >* Note that we intentionally don't take the info_lck spinlock
> > >* here.  We might therefore read a slightly stale value of
> > >* the recoveryPause flag, but it can't be very stale (no
> > >* worse than the last spinlock we did acquire).  Since a
> > >* pause request is a pretty asynchronous thing anyway,
> > >* possibly responding to it one WAL record later than we
> > >* otherwise would is a minor issue, so it doesn't seem worth
> > >* adding another spinlock cycle to prevent that.
> >
> > As the result, this patch tries to introduce several new checkpoints
> > to some delaying point so that that waits can find pause request in a
> > timely manner. I think we had better use locking (or atomics) for the
> > information instead of such scattered checkpoints if we expect that
> > machinery to work in a such syhcnronous manner.
> >
> > That would make the tri-state state variable and many checkpoints
> > unnecessary. Maybe.
> 
> I don't think the intention was so to make it synchronous,  I think
> the main intention was that pg_is_wal_replay_paused can return us the
> correct state, in short user can know that whether any more wal will
> be replayed after pg_is_wal_replay_paused returns true or some other
> state.  I agree that along with that we have also introduced some

I meant that kind of correctness in a time-series by using the word
"synchronous". So it can be implemented both by adopting many
checkpoints and by just makeing the state-change 

Re: Is Recovery actually paused?

2021-01-24 Thread Dilip Kumar
On Mon, Jan 25, 2021 at 8:42 AM Kyotaro Horiguchi
 wrote:
>
> At Sun, 24 Jan 2021 14:26:08 +0530, Dilip Kumar  wrote 
> in
> > On Sun, Jan 24, 2021 at 12:16 PM Bharath Rupireddy
> >  wrote:
> > >
> > > On Sun, Jan 24, 2021 at 7:17 AM Dilip Kumar  wrote:
> > > > On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy 
> > > >  wrote:
> > > >>
> > > >> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar  
> > > >> wrote:
> > > >> > Please find the patch for the same.  I haven't added a test case for
> > > >> > this yet.  I mean we can write a test case to pause the recovery and
> > > >> > get the status.  But I am not sure that we can really write a 
> > > >> > reliable
> > > >> > test case for 'pause requested' and 'paused'.
> > > >>
> > > >> +1 to just show the recovery pause state in the output of
> > > >> pg_is_wal_replay_paused. But, should the function name
> > > >> "pg_is_wal_replay_paused" be something like
> > > >> "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
> > > >> in a function, I expect a boolean output. Others may have better
> > > >> thoughts.
> > > >>
> > > >> IIUC the above change, ensuring the recovery is paused after it's
> > > >> requested lies with the user. IMHO, the main problem we are trying to
> > > >> solve is not met
> > > >
> > > >
> > > > Basically earlier their was no way for the user yo know whether the 
> > > > recovery is actually paused or not because it was always returning true 
> > > > after pause requested.  Now, we will return whether pause requested or 
> > > > actually paused.  So > for tool designer who want to wait for recovery 
> > > > to get paused can have a loop and wait until the recovery state reaches 
> > > > to paused.  That will give a better control.
> > >
> > > I get it and I agree to have that change. My point was whether we can
> > > have a new function pg_wal_replay_pause_and_wait that waits until
> > > recovery is actually paused ((along with pg_is_wal_replay_paused
> > > returning the actual state than a true/false) so that tool developers
> > > don't need to have the waiting code outside, if at all they care about
> > > it? Others may have better thoughts than me.
> >
> > I think the previous patch was based on that idea where we thought
> > that we can pass an argument to pg_is_wal_replay_paused which can
> > decide whether to wait or return without the wait.  I think this
> > version looks better to me where we give the status instead of
> > waiting.  I am not sure whether we want another version of
> > pg_wal_replay_pause which will wait for actually it to get paused.  I
> > mean there is always a scope to include the functionality in the
> > database which can be achieved by the tool, but this patch was trying
> > to solve the problem that there was no way to know the status so I
> > think returning the correct status should be the scope of this.
>
> I understand that the requirement here is that no record is replayed
> after pg_wal_replay_pause() is returned, or pg_is_wal_replay_paused()
> returns true, and delays taken while recovery don't delay the state
> change. That requirements are really synchronous.
>
> On the other hand the machinery is designed to be asynchronous.
>
> >* Note that we intentionally don't take the info_lck spinlock
> >* here.  We might therefore read a slightly stale value of
> >* the recoveryPause flag, but it can't be very stale (no
> >* worse than the last spinlock we did acquire).  Since a
> >* pause request is a pretty asynchronous thing anyway,
> >* possibly responding to it one WAL record later than we
> >* otherwise would is a minor issue, so it doesn't seem worth
> >* adding another spinlock cycle to prevent that.
>
> As the result, this patch tries to introduce several new checkpoints
> to some delaying point so that that waits can find pause request in a
> timely manner. I think we had better use locking (or atomics) for the
> information instead of such scattered checkpoints if we expect that
> machinery to work in a such syhcnronous manner.
>
> That would make the tri-state state variable and many checkpoints
> unnecessary. Maybe.

I don't think the intention was so to make it synchronous,  I think
the main intention was that pg_is_wal_replay_paused can return us the
correct state, in short user can know that whether any more wal will
be replayed after pg_is_wal_replay_paused returns true or some other
state.  I agree that along with that we have also introduced some
extra checkpoint where the recovery process is waiting for WAL and
apply delay and from the pg_wal_replay_pause we had sent a signal to
wakeup the recovery process.  So I am not sure is it worth adding the
lock/atomic variable to make this synchronous.  Any other thoughts on
this?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-01-24 Thread Dilip Kumar
On Mon, Jan 25, 2021 at 6:12 AM Masahiko Sawada  wrote:
>
> On Sun, Jan 17, 2021 at 5:19 PM Dilip Kumar  wrote:
> >
> > On Sat, Jan 16, 2021 at 8:59 AM Masahiko Sawada  
> > wrote:
> > >
> > > On Wed, Jan 13, 2021 at 9:20 PM Dilip Kumar  wrote:
> > > >
> > > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar  
> > > > wrote:
> > > > >
> > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA  
> > > > > wrote:
> > > > > >
> > > > > > On Thu, 10 Dec 2020 11:25:23 +0530
> > > > > > Dilip Kumar  wrote:
> > > > > >
> > > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused 
> > > > > > > > > to wait.
> > > > > > > > > Especially, if max_standby_streaming_delay is -1, this will 
> > > > > > > > > be blocked forever,
> > > > > > > > > although this setting may not be usual. In addition, some 
> > > > > > > > > users may set
> > > > > > > > > recovery_min_apply_delay for a large.  If such users call 
> > > > > > > > > pg_is_wal_replay_paused,
> > > > > > > > > it could wait for a long time.
> > > > > > > > >
> > > > > > > > > At least, I think we need some descriptions on document to 
> > > > > > > > > explain
> > > > > > > > > pg_is_wal_replay_paused could wait while a time.
> > > > > > > >
> > > > > > > > Ok
> > > > > > >
> > > > > > > Fixed this, added some comments in .sgml as well as in function 
> > > > > > > header
> > > > > >
> > > > > > Thank you for fixing this.
> > > > > >
> > > > > > Also, is it better to fix the description of pg_wal_replay_pause 
> > > > > > from
> > > > > > "Pauses recovery." to "Request to pause recovery." in according with
> > > > > > pg_is_wal_replay_paused?
> > > > >
> > > > > Okay
> > > > >
> > > > > >
> > > > > > > > > Also, how about adding a new boolean argument to 
> > > > > > > > > pg_is_wal_replay_paused to
> > > > > > > > > control whether this waits for recovery to get paused or not? 
> > > > > > > > > By setting its
> > > > > > > > > default value to true or false, users can use the old format 
> > > > > > > > > for calling this
> > > > > > > > > and the backward compatibility can be maintained.
> > > > > > > >
> > > > > > > > So basically, if the wait_recovery_pause flag is false then we 
> > > > > > > > will
> > > > > > > > immediately return true if the pause is requested?  I agree 
> > > > > > > > that it is
> > > > > > > > good to have an API to know whether the recovery pause is 
> > > > > > > > requested or
> > > > > > > > not but I am not sure is it good idea to make this API serve 
> > > > > > > > both the
> > > > > > > > purpose?  Anyone else have any thoughts on this?
> > > > > > > >
> > > > > >
> > > > > > I think the current pg_is_wal_replay_paused() already has another 
> > > > > > purpose;
> > > > > > this waits recovery to actually get paused. If we want to limit 
> > > > > > this API's
> > > > > > purpose only to return the pause state, it seems better to fix this 
> > > > > > to return
> > > > > > the actual state at the cost of lacking the backward compatibility. 
> > > > > > If we want
> > > > > > to know whether pause is requested, we may add a new API like
> > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait 
> > > > > > recovery to actually
> > > > > > get paused, we may add an option to pg_wal_replay_pause() for this 
> > > > > > purpose.
> > > > > >
> > > > > > However, this might be a bikeshedding. If anyone don't care that
> > > > > > pg_is_wal_replay_paused() can make user wait for a long time, I 
> > > > > > don't care either.
> > > > >
> > > > > I don't think that it will be blocked ever, because
> > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the
> > > > > recovery process will not be stuck on waiting for the WAL.
> > > > >
> > > > > > > > > As another comment, while pg_is_wal_replay_paused is 
> > > > > > > > > blocking, I can not cancel
> > > > > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the 
> > > > > > > > > waiting loop.
> > > > > >
> > > > > > How about this fix? I think users may want to cancel 
> > > > > > pg_is_wal_replay_paused() during
> > > > > > this is blocking.
> > > > >
> > > > > Yeah, we can do this.  I will send the updated patch after putting
> > > > > some more thought into these comments.  Thanks again for the feedback.
> > > > >
> > > >
> > > > Please find the updated patch.
> > >
> > > I've looked at the patch. Here are review comments:
> > >
> > > +   /* Recovery pause state */
> > > +   RecoveryPauseState  recoveryPause;
> > >
> > > Now that the value can have tri-state, how about renaming it to
> > > recoveryPauseState?
> >
> > This makes sense to me.
> >
> > > ---
> > >  bool
> > >  RecoveryIsPaused(void)
> > > +{
> > > +   boolrecoveryPause;
> > > +
> > > +   SpinLockAcquire(>info_lck);
> > > +   recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED) ?
> > > true : false;
> > > +   SpinLockRelease(>info_lck);
> > > +
> > > +   return recoveryPause;
> > > +}
> > > +
> > > +bool
> 

Re: Is Recovery actually paused?

2021-01-24 Thread Kyotaro Horiguchi
At Sun, 24 Jan 2021 14:26:08 +0530, Dilip Kumar  wrote 
in 
> On Sun, Jan 24, 2021 at 12:16 PM Bharath Rupireddy
>  wrote:
> >
> > On Sun, Jan 24, 2021 at 7:17 AM Dilip Kumar  wrote:
> > > On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy 
> > >  wrote:
> > >>
> > >> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar  
> > >> wrote:
> > >> > Please find the patch for the same.  I haven't added a test case for
> > >> > this yet.  I mean we can write a test case to pause the recovery and
> > >> > get the status.  But I am not sure that we can really write a reliable
> > >> > test case for 'pause requested' and 'paused'.
> > >>
> > >> +1 to just show the recovery pause state in the output of
> > >> pg_is_wal_replay_paused. But, should the function name
> > >> "pg_is_wal_replay_paused" be something like
> > >> "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
> > >> in a function, I expect a boolean output. Others may have better
> > >> thoughts.
> > >>
> > >> IIUC the above change, ensuring the recovery is paused after it's
> > >> requested lies with the user. IMHO, the main problem we are trying to
> > >> solve is not met
> > >
> > >
> > > Basically earlier their was no way for the user yo know whether the 
> > > recovery is actually paused or not because it was always returning true 
> > > after pause requested.  Now, we will return whether pause requested or 
> > > actually paused.  So > for tool designer who want to wait for recovery to 
> > > get paused can have a loop and wait until the recovery state reaches to 
> > > paused.  That will give a better control.
> >
> > I get it and I agree to have that change. My point was whether we can
> > have a new function pg_wal_replay_pause_and_wait that waits until
> > recovery is actually paused ((along with pg_is_wal_replay_paused
> > returning the actual state than a true/false) so that tool developers
> > don't need to have the waiting code outside, if at all they care about
> > it? Others may have better thoughts than me.
> 
> I think the previous patch was based on that idea where we thought
> that we can pass an argument to pg_is_wal_replay_paused which can
> decide whether to wait or return without the wait.  I think this
> version looks better to me where we give the status instead of
> waiting.  I am not sure whether we want another version of
> pg_wal_replay_pause which will wait for actually it to get paused.  I
> mean there is always a scope to include the functionality in the
> database which can be achieved by the tool, but this patch was trying
> to solve the problem that there was no way to know the status so I
> think returning the correct status should be the scope of this.

I understand that the requirement here is that no record is replayed
after pg_wal_replay_pause() is returned, or pg_is_wal_replay_paused()
returns true, and delays taken while recovery don't delay the state
change. That requirements are really synchronous.

On the other hand the machinery is designed to be asynchronous.

>* Note that we intentionally don't take the info_lck spinlock
>* here.  We might therefore read a slightly stale value of
>* the recoveryPause flag, but it can't be very stale (no
>* worse than the last spinlock we did acquire).  Since a
>* pause request is a pretty asynchronous thing anyway,
>* possibly responding to it one WAL record later than we
>* otherwise would is a minor issue, so it doesn't seem worth
>* adding another spinlock cycle to prevent that.

As the result, this patch tries to introduce several new checkpoints
to some delaying point so that that waits can find pause request in a
timely manner. I think we had better use locking (or atomics) for the
information instead of such scattered checkpoints if we expect that
machinery to work in a such syhcnronous manner.

That would make the tri-state state variable and many checkpoints
unnecessary. Maybe.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Is Recovery actually paused?

2021-01-24 Thread Masahiko Sawada
On Sun, Jan 17, 2021 at 5:19 PM Dilip Kumar  wrote:
>
> On Sat, Jan 16, 2021 at 8:59 AM Masahiko Sawada  wrote:
> >
> > On Wed, Jan 13, 2021 at 9:20 PM Dilip Kumar  wrote:
> > >
> > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar  wrote:
> > > >
> > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA  wrote:
> > > > >
> > > > > On Thu, 10 Dec 2020 11:25:23 +0530
> > > > > Dilip Kumar  wrote:
> > > > >
> > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to 
> > > > > > > > wait.
> > > > > > > > Especially, if max_standby_streaming_delay is -1, this will be 
> > > > > > > > blocked forever,
> > > > > > > > although this setting may not be usual. In addition, some users 
> > > > > > > > may set
> > > > > > > > recovery_min_apply_delay for a large.  If such users call 
> > > > > > > > pg_is_wal_replay_paused,
> > > > > > > > it could wait for a long time.
> > > > > > > >
> > > > > > > > At least, I think we need some descriptions on document to 
> > > > > > > > explain
> > > > > > > > pg_is_wal_replay_paused could wait while a time.
> > > > > > >
> > > > > > > Ok
> > > > > >
> > > > > > Fixed this, added some comments in .sgml as well as in function 
> > > > > > header
> > > > >
> > > > > Thank you for fixing this.
> > > > >
> > > > > Also, is it better to fix the description of pg_wal_replay_pause from
> > > > > "Pauses recovery." to "Request to pause recovery." in according with
> > > > > pg_is_wal_replay_paused?
> > > >
> > > > Okay
> > > >
> > > > >
> > > > > > > > Also, how about adding a new boolean argument to 
> > > > > > > > pg_is_wal_replay_paused to
> > > > > > > > control whether this waits for recovery to get paused or not? 
> > > > > > > > By setting its
> > > > > > > > default value to true or false, users can use the old format 
> > > > > > > > for calling this
> > > > > > > > and the backward compatibility can be maintained.
> > > > > > >
> > > > > > > So basically, if the wait_recovery_pause flag is false then we 
> > > > > > > will
> > > > > > > immediately return true if the pause is requested?  I agree that 
> > > > > > > it is
> > > > > > > good to have an API to know whether the recovery pause is 
> > > > > > > requested or
> > > > > > > not but I am not sure is it good idea to make this API serve both 
> > > > > > > the
> > > > > > > purpose?  Anyone else have any thoughts on this?
> > > > > > >
> > > > >
> > > > > I think the current pg_is_wal_replay_paused() already has another 
> > > > > purpose;
> > > > > this waits recovery to actually get paused. If we want to limit this 
> > > > > API's
> > > > > purpose only to return the pause state, it seems better to fix this 
> > > > > to return
> > > > > the actual state at the cost of lacking the backward compatibility. 
> > > > > If we want
> > > > > to know whether pause is requested, we may add a new API like
> > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait 
> > > > > recovery to actually
> > > > > get paused, we may add an option to pg_wal_replay_pause() for this 
> > > > > purpose.
> > > > >
> > > > > However, this might be a bikeshedding. If anyone don't care that
> > > > > pg_is_wal_replay_paused() can make user wait for a long time, I don't 
> > > > > care either.
> > > >
> > > > I don't think that it will be blocked ever, because
> > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the
> > > > recovery process will not be stuck on waiting for the WAL.
> > > >
> > > > > > > > As another comment, while pg_is_wal_replay_paused is blocking, 
> > > > > > > > I can not cancel
> > > > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the 
> > > > > > > > waiting loop.
> > > > >
> > > > > How about this fix? I think users may want to cancel 
> > > > > pg_is_wal_replay_paused() during
> > > > > this is blocking.
> > > >
> > > > Yeah, we can do this.  I will send the updated patch after putting
> > > > some more thought into these comments.  Thanks again for the feedback.
> > > >
> > >
> > > Please find the updated patch.
> >
> > I've looked at the patch. Here are review comments:
> >
> > +   /* Recovery pause state */
> > +   RecoveryPauseState  recoveryPause;
> >
> > Now that the value can have tri-state, how about renaming it to
> > recoveryPauseState?
>
> This makes sense to me.
>
> > ---
> >  bool
> >  RecoveryIsPaused(void)
> > +{
> > +   boolrecoveryPause;
> > +
> > +   SpinLockAcquire(>info_lck);
> > +   recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED) ?
> > true : false;
> > +   SpinLockRelease(>info_lck);
> > +
> > +   return recoveryPause;
> > +}
> > +
> > +bool
> > +RecoveryPauseRequested(void)
> >  {
> > boolrecoveryPause;
> >
> > SpinLockAcquire(>info_lck);
> > -   recoveryPause = XLogCtl->recoveryPause;
> > +   recoveryPause = (XLogCtl->recoveryPause !=
> > RECOVERY_IN_PROGRESS) ? true : false;
> > SpinLockRelease(>info_lck);
> >
> > return 

Re: Is Recovery actually paused?

2021-01-24 Thread Dilip Kumar
On Sun, Jan 24, 2021 at 12:16 PM Bharath Rupireddy
 wrote:
>
> On Sun, Jan 24, 2021 at 7:17 AM Dilip Kumar  wrote:
> > On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy 
> >  wrote:
> >>
> >> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar  wrote:
> >> > Please find the patch for the same.  I haven't added a test case for
> >> > this yet.  I mean we can write a test case to pause the recovery and
> >> > get the status.  But I am not sure that we can really write a reliable
> >> > test case for 'pause requested' and 'paused'.
> >>
> >> +1 to just show the recovery pause state in the output of
> >> pg_is_wal_replay_paused. But, should the function name
> >> "pg_is_wal_replay_paused" be something like
> >> "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
> >> in a function, I expect a boolean output. Others may have better
> >> thoughts.
> >>
> >> IIUC the above change, ensuring the recovery is paused after it's
> >> requested lies with the user. IMHO, the main problem we are trying to
> >> solve is not met
> >
> >
> > Basically earlier their was no way for the user yo know whether the 
> > recovery is actually paused or not because it was always returning true 
> > after pause requested.  Now, we will return whether pause requested or 
> > actually paused.  So > for tool designer who want to wait for recovery to 
> > get paused can have a loop and wait until the recovery state reaches to 
> > paused.  That will give a better control.
>
> I get it and I agree to have that change. My point was whether we can
> have a new function pg_wal_replay_pause_and_wait that waits until
> recovery is actually paused ((along with pg_is_wal_replay_paused
> returning the actual state than a true/false) so that tool developers
> don't need to have the waiting code outside, if at all they care about
> it? Others may have better thoughts than me.

I think the previous patch was based on that idea where we thought
that we can pass an argument to pg_is_wal_replay_paused which can
decide whether to wait or return without the wait.  I think this
version looks better to me where we give the status instead of
waiting.  I am not sure whether we want another version of
pg_wal_replay_pause which will wait for actually it to get paused.  I
mean there is always a scope to include the functionality in the
database which can be achieved by the tool, but this patch was trying
to solve the problem that there was no way to know the status so I
think returning the correct status should be the scope of this.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-01-23 Thread Bharath Rupireddy
On Sun, Jan 24, 2021 at 11:29 AM Dilip Kumar  wrote:
> > Some comments on the v6 patch:

> > [2] Typo - it's "requested" + * 'paused requested' - if pause is
> > reqested but recovery is not yet paused

Here I meant the typo "reqested" in "if pause is reqested but recovery
is not yet paused" statement from v6 patch.

> > [3] I think it's "+ * 'pause requested'" instead of "+ * 'paused requested'"
>
> Which code does it refer, can give put the snippet from the patch.
> However, I have found there were 'paused requested' in two places so I
> have fixed.

Thanks.

> > [6] As I mentioned upthread, isn't it better to have
> > "IsRecoveryPaused(void)" than "RecoveryIsPaused(void)"?
>
> That is an existing function so I think it's fine to keep the same name.

Personally, I think the function RecoveryIsPaused itself is
unnecessary with the new function GetRecoveryPauseState introduced in
your patch. IMHO, we can remove it. If not okay, then we are at it,
can we at least change the function name to be meaningful
"IsRecoveryPaused"? Others may have better thoughts than me.

> > [7] Can we have the function variable name "recoveryPause" as "state"
> > or "pauseState? Because that variable context is set by the enum name
> > RecoveryPauseState and the function name.
> >
> > +SetRecoveryPause(RecoveryPauseState recoveryPause)
> >
> > Here as well, "recoveryPauseState" to "state"?
> > +GetRecoveryPauseState(void)
> >  {
> > -boolrecoveryPause;
> > +RecoveryPauseStaterecoveryPauseState;
>
> I don't think it is required but while changing the patch I will see
> whether to change or not.

It will be good to change that. I personally don't like structure
names and variable names to be the same.

> > [6] Function name RecoveryIsPaused and it's comment "Check whether the
> > recovery pause is requested." doesn't seem to be matching. Seems like
> > it returns true even when RECOVERY_PAUSE_REQUESTED or RECOVERY_PAUSED.
> > Should it return true only when the state is RECOVERY_PAUSE_REQUESTED?
>
> Code is doing right, I will change the comments.
>
> > Instead of "while (RecoveryIsPaused())", can't we change it to "while
> > (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)" and remove the
> > RecoveryIsPaused()?
>
> I think it looks clean with the function

As I said earlier, I see no use of RecoveryIsPaused() with the
introduction of the new function GetRecoveryPauseState(). Others may
have better thoughts than me.

> > [7] Can we change the switch-case in pg_is_wal_replay_paused to
> > something like below?
> >
> > Datum
> > pg_is_wal_replay_paused(PG_FUNCTION_ARGS)
> > {
> > +char   *state;
> > +/* get the recovery pause state */
> > +switch(GetRecoveryPauseState())
> > +{
> > +case RECOVERY_NOT_PAUSED:
> > +state = "not paused";
> > +case RECOVERY_PAUSE_REQUESTED:
> > +state = "paused requested";
> > +case RECOVERY_PAUSED:
> > +state = "paused";
> > +default:
> > +elog(ERROR, "invalid recovery pause state");
> > +}
> > +
> > +PG_RETURN_TEXT_P(cstring_to_text(type));
>
> Why do you think it is better to use an extra variable?

I see no wrong in having PG_RETURN_TEXT_P and cstring_to_text 's in
every case statement. But, just to make sure the code looks cleaner, I
said that we can have a local state variable and just one
PG_RETURN_TEXT_P(cstring_to_text(state));. See some existing functions
brin_page_type, hash_page_type, json_typeof,
pg_stat_get_backend_activity, pg_stat_get_backend_wait_event_type,
pg_stat_get_backend_wait_event, get_command_type.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-01-23 Thread Bharath Rupireddy
On Sun, Jan 24, 2021 at 7:17 AM Dilip Kumar  wrote:
> On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy 
>  wrote:
>>
>> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar  wrote:
>> > Please find the patch for the same.  I haven't added a test case for
>> > this yet.  I mean we can write a test case to pause the recovery and
>> > get the status.  But I am not sure that we can really write a reliable
>> > test case for 'pause requested' and 'paused'.
>>
>> +1 to just show the recovery pause state in the output of
>> pg_is_wal_replay_paused. But, should the function name
>> "pg_is_wal_replay_paused" be something like
>> "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
>> in a function, I expect a boolean output. Others may have better
>> thoughts.
>>
>> IIUC the above change, ensuring the recovery is paused after it's
>> requested lies with the user. IMHO, the main problem we are trying to
>> solve is not met
>
>
> Basically earlier their was no way for the user yo know whether the recovery 
> is actually paused or not because it was always returning true after pause 
> requested.  Now, we will return whether pause requested or actually paused.  
> So > for tool designer who want to wait for recovery to get paused can have a 
> loop and wait until the recovery state reaches to paused.  That will give a 
> better control.

I get it and I agree to have that change. My point was whether we can
have a new function pg_wal_replay_pause_and_wait that waits until
recovery is actually paused ((along with pg_is_wal_replay_paused
returning the actual state than a true/false) so that tool developers
don't need to have the waiting code outside, if at all they care about
it? Others may have better thoughts than me.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-01-23 Thread Dilip Kumar
On Sat, Jan 23, 2021 at 4:40 PM Bharath Rupireddy
 wrote:
>
> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar  wrote:
> > Please find the patch for the same.  I haven't added a test case for
> > this yet.  I mean we can write a test case to pause the recovery and
> > get the status.  But I am not sure that we can really write a reliable
> > test case for 'pause requested' and 'paused'.
>
> +1 to just show the recovery pause state in the output of
> pg_is_wal_replay_paused. But, should the function name
> "pg_is_wal_replay_paused" be something like
> "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
> in a function, I expect a boolean output. Others may have better
> thoughts.

I am fine with the name change, but don't feel that it will be
completely wrong if pg_is_wal_replay_paused returns a different state
of the recovery pause.
So I would like to see what others thinks and based on that we can decide.

> IIUC the above change, ensuring the recovery is paused after it's
> requested lies with the user. IMHO, the main problem we are trying to
> solve is not met. Isn't it better if we have a new function(wait
> version) along with the above change to pg_is_wal_replay_paused,
> something like "pg_wal_replay_pause_and_wait" returning true or false?
> The functionality is pg_wal_replay_pause + wait until it's actually
> paused.
>
> Thoughts?

Already replied in the last mail.

> Some comments on the v6 patch:
>
> [1] How about
> + * This function returns the current state of the recovery pause.
> instead of
> + * This api will return the current state of the recovery pause.

Okay

> [2] Typo - it's "requested" + * 'paused requested' - if pause is
> reqested but recovery is not yet paused
>
> [3] I think it's "+ * 'pause requested'" instead of "+ * 'paused requested'"

Which code does it refer, can give put the snippet from the patch.
However, I have found there were 'paused requested' in two places so I
have fixed.

> [4] Isn't it good to have an example usage and output of the function
> in the documentaion?
> +Returns recovery pause status, which is not
> paused if
> +pause is not requested, pause requested if pause 
> is
> +requested but recovery is not yet paused and,
> paused if
> +the recovery is actually paused.
> 

I will add.

> [5] Is it
> + * Wait until shared recoveryPause state is set to RECOVERY_NOT_PAUSED.
> instead of
> + * Wait until shared recoveryPause is set to RECOVERY_NOT_PAUSED.

Ok

> [6] As I mentioned upthread, isn't it better to have
> "IsRecoveryPaused(void)" than "RecoveryIsPaused(void)"?

That is an existing function so I think it's fine to keep the same name.

> [7] Can we have the function variable name "recoveryPause" as "state"
> or "pauseState? Because that variable context is set by the enum name
> RecoveryPauseState and the function name.
>
> +SetRecoveryPause(RecoveryPauseState recoveryPause)
>
> Here as well, "recoveryPauseState" to "state"?
> +GetRecoveryPauseState(void)
>  {
> -boolrecoveryPause;
> +RecoveryPauseStaterecoveryPauseState;

I don't think it is required but while changing the patch I will see
whether to change or not.

> [6] Function name RecoveryIsPaused and it's comment "Check whether the
> recovery pause is requested." doesn't seem to be matching. Seems like
> it returns true even when RECOVERY_PAUSE_REQUESTED or RECOVERY_PAUSED.
> Should it return true only when the state is RECOVERY_PAUSE_REQUESTED?

Code is doing right, I will change the comments.

> Instead of "while (RecoveryIsPaused())", can't we change it to "while
> (GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)" and remove the
> RecoveryIsPaused()?

I think it looks clean with the function

> [7] Can we change the switch-case in pg_is_wal_replay_paused to
> something like below?
>
> Datum
> pg_is_wal_replay_paused(PG_FUNCTION_ARGS)
> {
> +char   *state;
> +/* get the recovery pause state */
> +switch(GetRecoveryPauseState())
> +{
> +case RECOVERY_NOT_PAUSED:
> +state = "not paused";
> +case RECOVERY_PAUSE_REQUESTED:
> +state = "paused requested";
> +case RECOVERY_PAUSED:
> +state = "paused";
> +default:
> +elog(ERROR, "invalid recovery pause state");
> +}
> +
> +PG_RETURN_TEXT_P(cstring_to_text(type));

Why do you think it is better to use an extra variable?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-01-23 Thread Dilip Kumar
On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar  wrote:
> > Please find the patch for the same.  I haven't added a test case for
> > this yet.  I mean we can write a test case to pause the recovery and
> > get the status.  But I am not sure that we can really write a reliable
> > test case for 'pause requested' and 'paused'.
>
> +1 to just show the recovery pause state in the output of
> pg_is_wal_replay_paused. But, should the function name
> "pg_is_wal_replay_paused" be something like
> "pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
> in a function, I expect a boolean output. Others may have better
> thoughts.
>
> IIUC the above change, ensuring the recovery is paused after it's
> requested lies with the user. IMHO, the main problem we are trying to
> solve is not met


Basically earlier their was no way for the user yo know whether the
recovery is actually paused or not because it was always returning true
after pause requested.  Now, we will return whether pause requested or
actually paused.  So for tool designer who want to wait for recovery to get
paused can have a loop and wait until the recovery state reaches to
paused.  That will give a better control.

I will check other comments and respond along with the patch.

> --
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: Is Recovery actually paused?

2021-01-23 Thread Bharath Rupireddy
On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar  wrote:
> Please find the patch for the same.  I haven't added a test case for
> this yet.  I mean we can write a test case to pause the recovery and
> get the status.  But I am not sure that we can really write a reliable
> test case for 'pause requested' and 'paused'.

+1 to just show the recovery pause state in the output of
pg_is_wal_replay_paused. But, should the function name
"pg_is_wal_replay_paused" be something like
"pg_get_wal_replay_pause_state" or some other? To me, when "is" exists
in a function, I expect a boolean output. Others may have better
thoughts.

IIUC the above change, ensuring the recovery is paused after it's
requested lies with the user. IMHO, the main problem we are trying to
solve is not met. Isn't it better if we have a new function(wait
version) along with the above change to pg_is_wal_replay_paused,
something like "pg_wal_replay_pause_and_wait" returning true or false?
The functionality is pg_wal_replay_pause + wait until it's actually
paused.

Thoughts?

Some comments on the v6 patch:

[1] How about
+ * This function returns the current state of the recovery pause.
instead of
+ * This api will return the current state of the recovery pause.

[2] Typo - it's "requested" + * 'paused requested' - if pause is
reqested but recovery is not yet paused

[3] I think it's "+ * 'pause requested'" instead of "+ * 'paused requested'"

[4] Isn't it good to have an example usage and output of the function
in the documentaion?
+Returns recovery pause status, which is not
paused if
+pause is not requested, pause requested if pause is
+requested but recovery is not yet paused and,
paused if
+the recovery is actually paused.


[5] Is it
+ * Wait until shared recoveryPause state is set to RECOVERY_NOT_PAUSED.
instead of
+ * Wait until shared recoveryPause is set to RECOVERY_NOT_PAUSED.

[6] As I mentioned upthread, isn't it better to have
"IsRecoveryPaused(void)" than "RecoveryIsPaused(void)"?

[7] Can we have the function variable name "recoveryPause" as "state"
or "pauseState? Because that variable context is set by the enum name
RecoveryPauseState and the function name.

+SetRecoveryPause(RecoveryPauseState recoveryPause)

Here as well, "recoveryPauseState" to "state"?
+GetRecoveryPauseState(void)
 {
-boolrecoveryPause;
+RecoveryPauseStaterecoveryPauseState;

[6] Function name RecoveryIsPaused and it's comment "Check whether the
recovery pause is requested." doesn't seem to be matching. Seems like
it returns true even when RECOVERY_PAUSE_REQUESTED or RECOVERY_PAUSED.
Should it return true only when the state is RECOVERY_PAUSE_REQUESTED?

Instead of "while (RecoveryIsPaused())", can't we change it to "while
(GetRecoveryPauseState() != RECOVERY_NOT_PAUSED)" and remove the
RecoveryIsPaused()?

[7] Can we change the switch-case in pg_is_wal_replay_paused to
something like below?

Datum
pg_is_wal_replay_paused(PG_FUNCTION_ARGS)
{
+char   *state;
+/* get the recovery pause state */
+switch(GetRecoveryPauseState())
+{
+case RECOVERY_NOT_PAUSED:
+state = "not paused";
+case RECOVERY_PAUSE_REQUESTED:
+state = "paused requested";
+case RECOVERY_PAUSED:
+state = "paused";
+default:
+elog(ERROR, "invalid recovery pause state");
+}
+
+PG_RETURN_TEXT_P(cstring_to_text(type));

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-01-23 Thread Dilip Kumar
On Sat, Jan 23, 2021 at 9:56 AM Dilip Kumar  wrote:
>
> On Fri, Jan 22, 2021 at 2:18 AM Robert Haas  wrote:
> >
> > On Mon, Jan 18, 2021 at 9:42 PM Yugo NAGATA  wrote:
> > > If it is acceptable that pg_is_wal_replay_paused() makes users wait,
> > > I'm ok for the current interface. I don't feel the need of
> > > pg_is_wal_replay_paluse_requeseted().
> >
> > Another idea could be that pg_is_wal_replay_paused() could be changed
> > to text, and the string could be either 'paused' or 'pause requested'
> > or 'not paused'. That way we'd be returning a direct representation of
> > the state we're keeping in memory. Some of the complexity in this
> > discussion seems to come from trying to squeeze 3 possibilities into a
> > Boolean.
> >
> > Let's also consider that we don't really know whether the client wants
> > us to wait or not, and different clients may want different things, or
> > maybe not, but we don't really know at this point. If we provide an
> > interface that waits, and the client doesn't want to wait but just
> > know the current state, they don't necessarily have any great options.
> > If we provide an interface that doesn't wait, and the client wants to
> > wait, it can poll until it gets the answer it wants. Polling can be
> > inefficient, but anybody who is writing a tool that uses this should
> > be able to manage an algorithm with some reasonable back-off behavior
> > (e.g. try after 10ms, 20ms, keep doubling, max of 5s, or something of
> > that sort), so I'm not sure there's actually any real problem in
> > practice. So to me it seems more likely that an interface that is
> > based on waiting will cause difficulty for tool-writers than one that
> > does not.
> >
> > Other people may feel differently, of course...
>
> I think this is the better way of handling this.  So +1 from my side,
> I will send an updated patch.

Please find the patch for the same.  I haven't added a test case for
this yet.  I mean we can write a test case to pause the recovery and
get the status.  But I am not sure that we can really write a reliable
test case for 'pause requested' and 'paused'.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v6-0001-pg_is_wal_replay_paused-will-return-the-status-of.patch
Description: Binary data


Re: Is Recovery actually paused?

2021-01-22 Thread Dilip Kumar
On Thu, Jan 21, 2021 at 6:20 PM Yugo NAGATA  wrote:
>
> On Tue, 19 Jan 2021 21:32:31 +0530
> Dilip Kumar  wrote:
>
> > On Tue, Jan 19, 2021 at 8:34 AM Dilip Kumar  wrote:
> > >
> > > On Tue, 19 Jan 2021 at 8:12 AM, Yugo NAGATA  wrote:
> > >>
> > >> On Sun, 17 Jan 2021 11:33:52 +0530
> > >> Dilip Kumar  wrote:
> > >>
> > >> > On Thu, Jan 14, 2021 at 6:49 PM Yugo NAGATA  
> > >> > wrote:
> > >> > >
> > >> > > On Wed, 13 Jan 2021 17:49:43 +0530
> > >> > > Dilip Kumar  wrote:
> > >> > >
> > >> > > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar 
> > >> > > >  wrote:
> > >> > > > >
> > >> > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA 
> > >> > > > >  wrote:
> > >> > > > > >
> > >> > > > > > On Thu, 10 Dec 2020 11:25:23 +0530
> > >> > > > > > Dilip Kumar  wrote:
> > >> > > > > >
> > >> > > > > > > > > However, I wonder users don't expect 
> > >> > > > > > > > > pg_is_wal_replay_paused to wait.
> > >> > > > > > > > > Especially, if max_standby_streaming_delay is -1, this 
> > >> > > > > > > > > will be blocked forever,
> > >> > > > > > > > > although this setting may not be usual. In addition, 
> > >> > > > > > > > > some users may set
> > >> > > > > > > > > recovery_min_apply_delay for a large.  If such users 
> > >> > > > > > > > > call pg_is_wal_replay_paused,
> > >> > > > > > > > > it could wait for a long time.
> > >> > > > > > > > >
> > >> > > > > > > > > At least, I think we need some descriptions on document 
> > >> > > > > > > > > to explain
> > >> > > > > > > > > pg_is_wal_replay_paused could wait while a time.
> > >> > > > > > > >
> > >> > > > > > > > Ok
> > >> > > > > > >
> > >> > > > > > > Fixed this, added some comments in .sgml as well as in 
> > >> > > > > > > function header
> > >> > > > > >
> > >> > > > > > Thank you for fixing this.
> > >> > > > > >
> > >> > > > > > Also, is it better to fix the description of 
> > >> > > > > > pg_wal_replay_pause from
> > >> > > > > > "Pauses recovery." to "Request to pause recovery." in 
> > >> > > > > > according with
> > >> > > > > > pg_is_wal_replay_paused?
> > >> > > > >
> > >> > > > > Okay
> > >> > > > >
> > >> > > > > >
> > >> > > > > > > > > Also, how about adding a new boolean argument to 
> > >> > > > > > > > > pg_is_wal_replay_paused to
> > >> > > > > > > > > control whether this waits for recovery to get paused or 
> > >> > > > > > > > > not? By setting its
> > >> > > > > > > > > default value to true or false, users can use the old 
> > >> > > > > > > > > format for calling this
> > >> > > > > > > > > and the backward compatibility can be maintained.
> > >> > > > > > > >
> > >> > > > > > > > So basically, if the wait_recovery_pause flag is false 
> > >> > > > > > > > then we will
> > >> > > > > > > > immediately return true if the pause is requested?  I 
> > >> > > > > > > > agree that it is
> > >> > > > > > > > good to have an API to know whether the recovery pause is 
> > >> > > > > > > > requested or
> > >> > > > > > > > not but I am not sure is it good idea to make this API 
> > >> > > > > > > > serve both the
> > >> > > > > > > > purpose?  Anyone else have any thoughts on this?
> > >> > > > > > > >
> > >> > > > > >
> > >> > > > > > I think the current pg_is_wal_replay_paused() already has 
> > >> > > > > > another purpose;
> > >> > > > > > this waits recovery to actually get paused. If we want to 
> > >> > > > > > limit this API's
> > >> > > > > > purpose only to return the pause state, it seems better to fix 
> > >> > > > > > this to return
> > >> > > > > > the actual state at the cost of lacking the backward 
> > >> > > > > > compatibility. If we want
> > >> > > > > > to know whether pause is requested, we may add a new API like
> > >> > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait 
> > >> > > > > > recovery to actually
> > >> > > > > > get paused, we may add an option to pg_wal_replay_pause() for 
> > >> > > > > > this purpose.
> > >> > > > > >
> > >> > > > > > However, this might be a bikeshedding. If anyone don't care 
> > >> > > > > > that
> > >> > > > > > pg_is_wal_replay_paused() can make user wait for a long time, 
> > >> > > > > > I don't care either.
> > >> > > > >
> > >> > > > > I don't think that it will be blocked ever, because
> > >> > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means 
> > >> > > > > the
> > >> > > > > recovery process will not be stuck on waiting for the WAL.
> > >> > >
> > >> > > Yes, there is no stuck on waiting for the WAL. However, it can be 
> > >> > > stuck during resolving
> > >> > > a recovery conflict. The process could wait for 
> > >> > > max_standby_streaming_delay or
> > >> > > max_standby_archive_delay at most before recovery get completely 
> > >> > > paused.
> > >> >
> > >> > Okay, I agree that it is possible so for handling this we have a
> > >> > couple of options
> > >> > 1. pg_is_wal_replay_paused(), interface will wait for recovery to
> > >> > actually get paused, but user have an option to cancel that.  So I
> > >> > agree that there is 

Re: Is Recovery actually paused?

2021-01-22 Thread Dilip Kumar
On Fri, Jan 22, 2021 at 2:18 AM Robert Haas  wrote:
>
> On Mon, Jan 18, 2021 at 9:42 PM Yugo NAGATA  wrote:
> > If it is acceptable that pg_is_wal_replay_paused() makes users wait,
> > I'm ok for the current interface. I don't feel the need of
> > pg_is_wal_replay_paluse_requeseted().
>
> Another idea could be that pg_is_wal_replay_paused() could be changed
> to text, and the string could be either 'paused' or 'pause requested'
> or 'not paused'. That way we'd be returning a direct representation of
> the state we're keeping in memory. Some of the complexity in this
> discussion seems to come from trying to squeeze 3 possibilities into a
> Boolean.
>
> Let's also consider that we don't really know whether the client wants
> us to wait or not, and different clients may want different things, or
> maybe not, but we don't really know at this point. If we provide an
> interface that waits, and the client doesn't want to wait but just
> know the current state, they don't necessarily have any great options.
> If we provide an interface that doesn't wait, and the client wants to
> wait, it can poll until it gets the answer it wants. Polling can be
> inefficient, but anybody who is writing a tool that uses this should
> be able to manage an algorithm with some reasonable back-off behavior
> (e.g. try after 10ms, 20ms, keep doubling, max of 5s, or something of
> that sort), so I'm not sure there's actually any real problem in
> practice. So to me it seems more likely that an interface that is
> based on waiting will cause difficulty for tool-writers than one that
> does not.
>
> Other people may feel differently, of course...

I think this is the better way of handling this.  So +1 from my side,
I will send an updated patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-01-21 Thread Robert Haas
On Mon, Jan 18, 2021 at 9:42 PM Yugo NAGATA  wrote:
> If it is acceptable that pg_is_wal_replay_paused() makes users wait,
> I'm ok for the current interface. I don't feel the need of
> pg_is_wal_replay_paluse_requeseted().

Another idea could be that pg_is_wal_replay_paused() could be changed
to text, and the string could be either 'paused' or 'pause requested'
or 'not paused'. That way we'd be returning a direct representation of
the state we're keeping in memory. Some of the complexity in this
discussion seems to come from trying to squeeze 3 possibilities into a
Boolean.

Let's also consider that we don't really know whether the client wants
us to wait or not, and different clients may want different things, or
maybe not, but we don't really know at this point. If we provide an
interface that waits, and the client doesn't want to wait but just
know the current state, they don't necessarily have any great options.
If we provide an interface that doesn't wait, and the client wants to
wait, it can poll until it gets the answer it wants. Polling can be
inefficient, but anybody who is writing a tool that uses this should
be able to manage an algorithm with some reasonable back-off behavior
(e.g. try after 10ms, 20ms, keep doubling, max of 5s, or something of
that sort), so I'm not sure there's actually any real problem in
practice. So to me it seems more likely that an interface that is
based on waiting will cause difficulty for tool-writers than one that
does not.

Other people may feel differently, of course...

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-01-21 Thread Yugo NAGATA
On Tue, 19 Jan 2021 21:32:31 +0530
Dilip Kumar  wrote:

> On Tue, Jan 19, 2021 at 8:34 AM Dilip Kumar  wrote:
> >
> > On Tue, 19 Jan 2021 at 8:12 AM, Yugo NAGATA  wrote:
> >>
> >> On Sun, 17 Jan 2021 11:33:52 +0530
> >> Dilip Kumar  wrote:
> >>
> >> > On Thu, Jan 14, 2021 at 6:49 PM Yugo NAGATA  wrote:
> >> > >
> >> > > On Wed, 13 Jan 2021 17:49:43 +0530
> >> > > Dilip Kumar  wrote:
> >> > >
> >> > > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar  
> >> > > > wrote:
> >> > > > >
> >> > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA  
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > On Thu, 10 Dec 2020 11:25:23 +0530
> >> > > > > > Dilip Kumar  wrote:
> >> > > > > >
> >> > > > > > > > > However, I wonder users don't expect 
> >> > > > > > > > > pg_is_wal_replay_paused to wait.
> >> > > > > > > > > Especially, if max_standby_streaming_delay is -1, this 
> >> > > > > > > > > will be blocked forever,
> >> > > > > > > > > although this setting may not be usual. In addition, some 
> >> > > > > > > > > users may set
> >> > > > > > > > > recovery_min_apply_delay for a large.  If such users call 
> >> > > > > > > > > pg_is_wal_replay_paused,
> >> > > > > > > > > it could wait for a long time.
> >> > > > > > > > >
> >> > > > > > > > > At least, I think we need some descriptions on document to 
> >> > > > > > > > > explain
> >> > > > > > > > > pg_is_wal_replay_paused could wait while a time.
> >> > > > > > > >
> >> > > > > > > > Ok
> >> > > > > > >
> >> > > > > > > Fixed this, added some comments in .sgml as well as in 
> >> > > > > > > function header
> >> > > > > >
> >> > > > > > Thank you for fixing this.
> >> > > > > >
> >> > > > > > Also, is it better to fix the description of pg_wal_replay_pause 
> >> > > > > > from
> >> > > > > > "Pauses recovery." to "Request to pause recovery." in according 
> >> > > > > > with
> >> > > > > > pg_is_wal_replay_paused?
> >> > > > >
> >> > > > > Okay
> >> > > > >
> >> > > > > >
> >> > > > > > > > > Also, how about adding a new boolean argument to 
> >> > > > > > > > > pg_is_wal_replay_paused to
> >> > > > > > > > > control whether this waits for recovery to get paused or 
> >> > > > > > > > > not? By setting its
> >> > > > > > > > > default value to true or false, users can use the old 
> >> > > > > > > > > format for calling this
> >> > > > > > > > > and the backward compatibility can be maintained.
> >> > > > > > > >
> >> > > > > > > > So basically, if the wait_recovery_pause flag is false then 
> >> > > > > > > > we will
> >> > > > > > > > immediately return true if the pause is requested?  I agree 
> >> > > > > > > > that it is
> >> > > > > > > > good to have an API to know whether the recovery pause is 
> >> > > > > > > > requested or
> >> > > > > > > > not but I am not sure is it good idea to make this API serve 
> >> > > > > > > > both the
> >> > > > > > > > purpose?  Anyone else have any thoughts on this?
> >> > > > > > > >
> >> > > > > >
> >> > > > > > I think the current pg_is_wal_replay_paused() already has 
> >> > > > > > another purpose;
> >> > > > > > this waits recovery to actually get paused. If we want to limit 
> >> > > > > > this API's
> >> > > > > > purpose only to return the pause state, it seems better to fix 
> >> > > > > > this to return
> >> > > > > > the actual state at the cost of lacking the backward 
> >> > > > > > compatibility. If we want
> >> > > > > > to know whether pause is requested, we may add a new API like
> >> > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait 
> >> > > > > > recovery to actually
> >> > > > > > get paused, we may add an option to pg_wal_replay_pause() for 
> >> > > > > > this purpose.
> >> > > > > >
> >> > > > > > However, this might be a bikeshedding. If anyone don't care that
> >> > > > > > pg_is_wal_replay_paused() can make user wait for a long time, I 
> >> > > > > > don't care either.
> >> > > > >
> >> > > > > I don't think that it will be blocked ever, because
> >> > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the
> >> > > > > recovery process will not be stuck on waiting for the WAL.
> >> > >
> >> > > Yes, there is no stuck on waiting for the WAL. However, it can be 
> >> > > stuck during resolving
> >> > > a recovery conflict. The process could wait for 
> >> > > max_standby_streaming_delay or
> >> > > max_standby_archive_delay at most before recovery get completely 
> >> > > paused.
> >> >
> >> > Okay, I agree that it is possible so for handling this we have a
> >> > couple of options
> >> > 1. pg_is_wal_replay_paused(), interface will wait for recovery to
> >> > actually get paused, but user have an option to cancel that.  So I
> >> > agree that there is currently no option to just know that recovery
> >> > pause is requested without waiting for its actually get paused if it
> >> > is requested.  So one option is we can provide an another interface as
> >> > you mentioned pg_is_wal_replay_paluse_requeseted(), which can just
> >> > return the request status.  I 

Re: Is Recovery actually paused?

2021-01-21 Thread Dilip Kumar
On Thu, Jan 21, 2021 at 3:29 PM Bharath Rupireddy
 wrote:

Thanks for reviewing Bharat.

> On Tue, Jan 19, 2021 at 9:32 PM Dilip Kumar  wrote:
> > In the last patch there were some local changes which I did not add to
> > the patch and it was giving compilation warning so fixed that along
> > with that I have addressed your this comment as well.
>
> Thanks for the patch. I took a look at the v5 patch, below are some
> comments. Please ignore if I'm repeating any of the comments discussed
> upthread.
>
> [1] Can we also have a wait version for pg_wal_replay_pause that waits
> until recovery is actually paused right after setting the recovery
> state to RECOVERY_PAUSE_REQUESTED? Something like this -
> pg_wal_replay_pause_and_wait(wait boolean, wait_seconds integer
> DEFAULT 60) returns boolean. It waits until either default or provided
> wait_seconds and returns true if the recovery is paused within that
> wait_seconds otherwise false. If wait_seconds is 0 or -1, then it
> waits until recovery is paused and returns true. One advantage of this
> function is that users don't need to call pg_is_wal_replay_paused().
> IMHO, the job of ensuring whether or not the recovery is actually
> paused, is better done by the one who requests
> it(pg_wal_replay_pause/pg_wal_replay_pause_and_wait).

I don't think we need wait/onwait version for all the APIs,  IMHO it
would be enough for the user to know whether the recovery is actually
paused or not
and for that, we are changing pg_is_wal_replay_paused to wait for the
pause.  However, in the next version in pg_is_wal_replay_paused I will
provide a flag so that the user can decide whether to wait for the
pause or just get the request status.

> [2] Is it intentional that RecoveryPauseRequested() returns true even
> if XLogCtl->recoveryPauseState is either RECOVERY_PAUSE_REQUESTED or
> RECOVERY_PAUSED?

Yes this is intended

> [3] Can we change IsRecoveryPaused() instead of RecoveryIsPaused() and
> IsRecoveryPauseRequested() instead of RecoveryPauseRequested()? How
> about having inline(because they have one line of code) functions like
> IsRecoveryPauseRequested(), IsRecoveryPaused() and
> IsRecoveryInProgress(), returning true when RECOVERY_PAUSE_REQUESTED,
> RECOVERY_PAUSED and RECOVERY_IN_PROGRESS respectively?

Yeah, we can do that, I am not sure whether we need
IsRecoveryInProgress function though.

> [4] Can we have at least one line of comments for each of the new
> functions, I know the function names mean everything? And also for
> existing SetRecoveryPause() and GetRecoveryPauseState()?

Will do that

> [5] Typo, it's "every time" ---> + * this everytime.

Ok

> [6] Do we reach PG_RETURN_BOOL(true); at the end of
> pg_is_wal_replay_paused()? If not, is it there for satisfying the
> compiler?

Yes

> [7] In pg_is_wal_replay_paused(), do we need if
> (!RecoveryPauseRequested()) inside the for (;;)? If yes, can we add
> comments about why we need it there?

Yes, we need it if replay resumed during the loop,  I will add the comments.

> [8] Isn't it good to have
> pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE) and
> pgstat_report_wait_end(), just before for (;;) and at the end
> respectively? This will help users in knowing what they are waiting
> for? Alternatively we can issue notice/warnings/write to server log
> while we are waiting in for (;;) for the recovery to get paused?

I think we can do that, let me think about this and get back to you.

> [9] In pg_is_wal_replay_paused(), is 10 msec sleep time chosen
> randomly or based on some analysis that the time it takes to get to
> recovery paused state from request state or some other?

I don't think we can identify that when actually recovery can get
paused.  Though in pg_wal_replay_pause() we are sending a signal to
all the places we are waiting for WAL and right after we are checking.
But it depends upon other configuration parameters like
max_standby_streaming_delay.

> [10]  errhint("The standby was promoted while waiting for recovery to
> be paused."))); Can we know whether standby is actually promoted and
> throw this error? Because the error "recovery is not in progress" here
> is being thrown by just relying on if (!RecoveryInProgress()). IIUC,
> using pg_promote

Because before checking this it has already checked
RecoveryPauseRequested() and if that is true then the
RecoveryInProgress was in progress at some time and now it is not
anymore and that can happen due to promotion.  But I am fine with
reverting to the old error that can not execute if recovery is not in
progress.

> [11] Can we try to add tests for these functions in TAP? Currently, we
> don't have any tests for pg_is_wal_replay_paused, pg_wal_replay_resume
> or pg_wal_replay_pause, but we have tests for pg_promote in timeline
> switch.

I will work on this.

> [12] Isn't it better to change RecoveryPauseState enum
> RECOVERY_IN_PROGRESS value to start from 1 instead of 0? Because when
> XLogCtl shared memory is initialized, I 

Re: Is Recovery actually paused?

2021-01-21 Thread Bharath Rupireddy
On Tue, Jan 19, 2021 at 9:32 PM Dilip Kumar  wrote:
> In the last patch there were some local changes which I did not add to
> the patch and it was giving compilation warning so fixed that along
> with that I have addressed your this comment as well.

Thanks for the patch. I took a look at the v5 patch, below are some
comments. Please ignore if I'm repeating any of the comments discussed
upthread.

[1] Can we also have a wait version for pg_wal_replay_pause that waits
until recovery is actually paused right after setting the recovery
state to RECOVERY_PAUSE_REQUESTED? Something like this -
pg_wal_replay_pause_and_wait(wait boolean, wait_seconds integer
DEFAULT 60) returns boolean. It waits until either default or provided
wait_seconds and returns true if the recovery is paused within that
wait_seconds otherwise false. If wait_seconds is 0 or -1, then it
waits until recovery is paused and returns true. One advantage of this
function is that users don't need to call pg_is_wal_replay_paused().
IMHO, the job of ensuring whether or not the recovery is actually
paused, is better done by the one who requests
it(pg_wal_replay_pause/pg_wal_replay_pause_and_wait).

[2] Is it intentional that RecoveryPauseRequested() returns true even
if XLogCtl->recoveryPauseState is either RECOVERY_PAUSE_REQUESTED or
RECOVERY_PAUSED?

[3] Can we change IsRecoveryPaused() instead of RecoveryIsPaused() and
IsRecoveryPauseRequested() instead of RecoveryPauseRequested()? How
about having inline(because they have one line of code) functions like
IsRecoveryPauseRequested(), IsRecoveryPaused() and
IsRecoveryInProgress(), returning true when RECOVERY_PAUSE_REQUESTED,
RECOVERY_PAUSED and RECOVERY_IN_PROGRESS respectively?

[4] Can we have at least one line of comments for each of the new
functions, I know the function names mean everything? And also for
existing SetRecoveryPause() and GetRecoveryPauseState()?

[5] Typo, it's "every time" ---> + * this everytime.

[6] Do we reach PG_RETURN_BOOL(true); at the end of
pg_is_wal_replay_paused()? If not, is it there for satisfying the
compiler?

[7] In pg_is_wal_replay_paused(), do we need if
(!RecoveryPauseRequested()) inside the for (;;)? If yes, can we add
comments about why we need it there?

[8] Isn't it good to have
pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE) and
pgstat_report_wait_end(), just before for (;;) and at the end
respectively? This will help users in knowing what they are waiting
for? Alternatively we can issue notice/warnings/write to server log
while we are waiting in for (;;) for the recovery to get paused?

[9] In pg_is_wal_replay_paused(), is 10 msec sleep time chosen
randomly or based on some analysis that the time it takes to get to
recovery paused state from request state or some other?

[10]  errhint("The standby was promoted while waiting for recovery to
be paused."))); Can we know whether standby is actually promoted and
throw this error? Because the error "recovery is not in progress" here
is being thrown by just relying on if (!RecoveryInProgress()). IIUC,
using pg_promote

[11] Can we try to add tests for these functions in TAP? Currently, we
don't have any tests for pg_is_wal_replay_paused, pg_wal_replay_resume
or pg_wal_replay_pause, but we have tests for pg_promote in timeline
switch.

[12] Isn't it better to change RecoveryPauseState enum
RECOVERY_IN_PROGRESS value to start from 1 instead of 0? Because when
XLogCtl shared memory is initialized, I think recoveryPauseState can
be 0, so should it mean recovery in progress?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-01-19 Thread Dilip Kumar
On Tue, Jan 19, 2021 at 8:34 AM Dilip Kumar  wrote:
>
> On Tue, 19 Jan 2021 at 8:12 AM, Yugo NAGATA  wrote:
>>
>> On Sun, 17 Jan 2021 11:33:52 +0530
>> Dilip Kumar  wrote:
>>
>> > On Thu, Jan 14, 2021 at 6:49 PM Yugo NAGATA  wrote:
>> > >
>> > > On Wed, 13 Jan 2021 17:49:43 +0530
>> > > Dilip Kumar  wrote:
>> > >
>> > > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar  
>> > > > wrote:
>> > > > >
>> > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA  
>> > > > > wrote:
>> > > > > >
>> > > > > > On Thu, 10 Dec 2020 11:25:23 +0530
>> > > > > > Dilip Kumar  wrote:
>> > > > > >
>> > > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused 
>> > > > > > > > > to wait.
>> > > > > > > > > Especially, if max_standby_streaming_delay is -1, this will 
>> > > > > > > > > be blocked forever,
>> > > > > > > > > although this setting may not be usual. In addition, some 
>> > > > > > > > > users may set
>> > > > > > > > > recovery_min_apply_delay for a large.  If such users call 
>> > > > > > > > > pg_is_wal_replay_paused,
>> > > > > > > > > it could wait for a long time.
>> > > > > > > > >
>> > > > > > > > > At least, I think we need some descriptions on document to 
>> > > > > > > > > explain
>> > > > > > > > > pg_is_wal_replay_paused could wait while a time.
>> > > > > > > >
>> > > > > > > > Ok
>> > > > > > >
>> > > > > > > Fixed this, added some comments in .sgml as well as in function 
>> > > > > > > header
>> > > > > >
>> > > > > > Thank you for fixing this.
>> > > > > >
>> > > > > > Also, is it better to fix the description of pg_wal_replay_pause 
>> > > > > > from
>> > > > > > "Pauses recovery." to "Request to pause recovery." in according 
>> > > > > > with
>> > > > > > pg_is_wal_replay_paused?
>> > > > >
>> > > > > Okay
>> > > > >
>> > > > > >
>> > > > > > > > > Also, how about adding a new boolean argument to 
>> > > > > > > > > pg_is_wal_replay_paused to
>> > > > > > > > > control whether this waits for recovery to get paused or 
>> > > > > > > > > not? By setting its
>> > > > > > > > > default value to true or false, users can use the old format 
>> > > > > > > > > for calling this
>> > > > > > > > > and the backward compatibility can be maintained.
>> > > > > > > >
>> > > > > > > > So basically, if the wait_recovery_pause flag is false then we 
>> > > > > > > > will
>> > > > > > > > immediately return true if the pause is requested?  I agree 
>> > > > > > > > that it is
>> > > > > > > > good to have an API to know whether the recovery pause is 
>> > > > > > > > requested or
>> > > > > > > > not but I am not sure is it good idea to make this API serve 
>> > > > > > > > both the
>> > > > > > > > purpose?  Anyone else have any thoughts on this?
>> > > > > > > >
>> > > > > >
>> > > > > > I think the current pg_is_wal_replay_paused() already has another 
>> > > > > > purpose;
>> > > > > > this waits recovery to actually get paused. If we want to limit 
>> > > > > > this API's
>> > > > > > purpose only to return the pause state, it seems better to fix 
>> > > > > > this to return
>> > > > > > the actual state at the cost of lacking the backward 
>> > > > > > compatibility. If we want
>> > > > > > to know whether pause is requested, we may add a new API like
>> > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait 
>> > > > > > recovery to actually
>> > > > > > get paused, we may add an option to pg_wal_replay_pause() for this 
>> > > > > > purpose.
>> > > > > >
>> > > > > > However, this might be a bikeshedding. If anyone don't care that
>> > > > > > pg_is_wal_replay_paused() can make user wait for a long time, I 
>> > > > > > don't care either.
>> > > > >
>> > > > > I don't think that it will be blocked ever, because
>> > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the
>> > > > > recovery process will not be stuck on waiting for the WAL.
>> > >
>> > > Yes, there is no stuck on waiting for the WAL. However, it can be stuck 
>> > > during resolving
>> > > a recovery conflict. The process could wait for 
>> > > max_standby_streaming_delay or
>> > > max_standby_archive_delay at most before recovery get completely paused.
>> >
>> > Okay, I agree that it is possible so for handling this we have a
>> > couple of options
>> > 1. pg_is_wal_replay_paused(), interface will wait for recovery to
>> > actually get paused, but user have an option to cancel that.  So I
>> > agree that there is currently no option to just know that recovery
>> > pause is requested without waiting for its actually get paused if it
>> > is requested.  So one option is we can provide an another interface as
>> > you mentioned pg_is_wal_replay_paluse_requeseted(), which can just
>> > return the request status.  I am not sure how useful it is.
>>
>> If it is acceptable that pg_is_wal_replay_paused() makes users wait,
>> I'm ok for the current interface. I don't feel the need of
>> pg_is_wal_replay_paluse_requeseted().
>>
>> >
>> > 2. Pass an option to pg_is_wal_replay_paused whether to 

Re: Is Recovery actually paused?

2021-01-18 Thread Dilip Kumar
On Tue, Jan 19, 2021 at 10:30 AM Kyotaro Horiguchi
 wrote:
>
> At Tue, 19 Jan 2021 11:41:18 +0900, Yugo NAGATA  wrote in
> > On Sun, 17 Jan 2021 11:33:52 +0530
> > Dilip Kumar  wrote:
> > > > > > >
> > > > > > > I think the current pg_is_wal_replay_paused() already has another 
> > > > > > > purpose;
> > > > > > > this waits recovery to actually get paused. If we want to limit 
> > > > > > > this API's
> > > > > > > purpose only to return the pause state, it seems better to fix 
> > > > > > > this to return
> > > > > > > the actual state at the cost of lacking the backward 
> > > > > > > compatibility. If we want
> > > > > > > to know whether pause is requested, we may add a new API like
> > > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait 
> > > > > > > recovery to actually
> > > > > > > get paused, we may add an option to pg_wal_replay_pause() for 
> > > > > > > this purpose.
> > > > > > >
> > > > > > > However, this might be a bikeshedding. If anyone don't care that
> > > > > > > pg_is_wal_replay_paused() can make user wait for a long time, I 
> > > > > > > don't care either.
> > > > > >
> > > > > > I don't think that it will be blocked ever, because
> > > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the
> > > > > > recovery process will not be stuck on waiting for the WAL.
> > > >
> > > > Yes, there is no stuck on waiting for the WAL. However, it can be stuck 
> > > > during resolving
> > > > a recovery conflict. The process could wait for 
> > > > max_standby_streaming_delay or
> > > > max_standby_archive_delay at most before recovery get completely paused.
> > >
> > > Okay, I agree that it is possible so for handling this we have a
> > > couple of options
> > > 1. pg_is_wal_replay_paused(), interface will wait for recovery to
> > > actually get paused, but user have an option to cancel that.  So I
> > > agree that there is currently no option to just know that recovery
> > > pause is requested without waiting for its actually get paused if it
> > > is requested.  So one option is we can provide an another interface as
> > > you mentioned pg_is_wal_replay_paluse_requeseted(), which can just
> > > return the request status.  I am not sure how useful it is.
> >
> > If it is acceptable that pg_is_wal_replay_paused() makes users wait,
> > I'm ok for the current interface. I don't feel the need of
> > pg_is_wal_replay_paluse_requeseted().
>
> FWIW, the name "pg_is_wal_replay_paused" is suggesting "to know
> whether recovery is paused or not at present" and it would be
> surprising to see it to wait for the recovery actually paused by
> default.
>
> I think there's no functions to wait for some situation at least for
> now.  If we wanted to wait for some condition to make, we would loop
> over check-and-wait using plpgsql.
>
> If you desire to wait to replication to pause by a function, I would
> do that by adding a parameter to the function.
>
> pg_is_wal_replay_paused(OPTIONAL bool wait_for_pause)

This seems to be a fair point to me.  So I will add an option to the
API,  and if that is passed true then we will wait for recovery to get
paused.
otherwise, this will just return true if the pause is requested same
as the current behavior.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-01-18 Thread Kyotaro Horiguchi
At Tue, 19 Jan 2021 11:41:18 +0900, Yugo NAGATA  wrote in 
> On Sun, 17 Jan 2021 11:33:52 +0530
> Dilip Kumar  wrote:
> > > > > >
> > > > > > I think the current pg_is_wal_replay_paused() already has another 
> > > > > > purpose;
> > > > > > this waits recovery to actually get paused. If we want to limit 
> > > > > > this API's
> > > > > > purpose only to return the pause state, it seems better to fix this 
> > > > > > to return
> > > > > > the actual state at the cost of lacking the backward compatibility. 
> > > > > > If we want
> > > > > > to know whether pause is requested, we may add a new API like
> > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait 
> > > > > > recovery to actually
> > > > > > get paused, we may add an option to pg_wal_replay_pause() for this 
> > > > > > purpose.
> > > > > >
> > > > > > However, this might be a bikeshedding. If anyone don't care that
> > > > > > pg_is_wal_replay_paused() can make user wait for a long time, I 
> > > > > > don't care either.
> > > > >
> > > > > I don't think that it will be blocked ever, because
> > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the
> > > > > recovery process will not be stuck on waiting for the WAL.
> > >
> > > Yes, there is no stuck on waiting for the WAL. However, it can be stuck 
> > > during resolving
> > > a recovery conflict. The process could wait for 
> > > max_standby_streaming_delay or
> > > max_standby_archive_delay at most before recovery get completely paused.
> > 
> > Okay, I agree that it is possible so for handling this we have a
> > couple of options
> > 1. pg_is_wal_replay_paused(), interface will wait for recovery to
> > actually get paused, but user have an option to cancel that.  So I
> > agree that there is currently no option to just know that recovery
> > pause is requested without waiting for its actually get paused if it
> > is requested.  So one option is we can provide an another interface as
> > you mentioned pg_is_wal_replay_paluse_requeseted(), which can just
> > return the request status.  I am not sure how useful it is.
> 
> If it is acceptable that pg_is_wal_replay_paused() makes users wait, 
> I'm ok for the current interface. I don't feel the need of
> pg_is_wal_replay_paluse_requeseted().

FWIW, the name "pg_is_wal_replay_paused" is suggesting "to know
whether recovery is paused or not at present" and it would be
surprising to see it to wait for the recovery actually paused by
default.

I think there's no functions to wait for some situation at least for
now.  If we wanted to wait for some condition to make, we would loop
over check-and-wait using plpgsql.

If you desire to wait to replication to pause by a function, I would
do that by adding a parameter to the function.

pg_is_wal_replay_paused(OPTIONAL bool wait_for_pause)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Is Recovery actually paused?

2021-01-18 Thread Dilip Kumar
On Tue, 19 Jan 2021 at 8:12 AM, Yugo NAGATA  wrote:

> On Sun, 17 Jan 2021 11:33:52 +0530
> Dilip Kumar  wrote:
>
> > On Thu, Jan 14, 2021 at 6:49 PM Yugo NAGATA  wrote:
> > >
> > > On Wed, 13 Jan 2021 17:49:43 +0530
> > > Dilip Kumar  wrote:
> > >
> > > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar 
> wrote:
> > > > >
> > > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA 
> wrote:
> > > > > >
> > > > > > On Thu, 10 Dec 2020 11:25:23 +0530
> > > > > > Dilip Kumar  wrote:
> > > > > >
> > > > > > > > > However, I wonder users don't expect
> pg_is_wal_replay_paused to wait.
> > > > > > > > > Especially, if max_standby_streaming_delay is -1, this
> will be blocked forever,
> > > > > > > > > although this setting may not be usual. In addition, some
> users may set
> > > > > > > > > recovery_min_apply_delay for a large.  If such users call
> pg_is_wal_replay_paused,
> > > > > > > > > it could wait for a long time.
> > > > > > > > >
> > > > > > > > > At least, I think we need some descriptions on document to
> explain
> > > > > > > > > pg_is_wal_replay_paused could wait while a time.
> > > > > > > >
> > > > > > > > Ok
> > > > > > >
> > > > > > > Fixed this, added some comments in .sgml as well as in
> function header
> > > > > >
> > > > > > Thank you for fixing this.
> > > > > >
> > > > > > Also, is it better to fix the description of pg_wal_replay_pause
> from
> > > > > > "Pauses recovery." to "Request to pause recovery." in according
> with
> > > > > > pg_is_wal_replay_paused?
> > > > >
> > > > > Okay
> > > > >
> > > > > >
> > > > > > > > > Also, how about adding a new boolean argument to
> pg_is_wal_replay_paused to
> > > > > > > > > control whether this waits for recovery to get paused or
> not? By setting its
> > > > > > > > > default value to true or false, users can use the old
> format for calling this
> > > > > > > > > and the backward compatibility can be maintained.
> > > > > > > >
> > > > > > > > So basically, if the wait_recovery_pause flag is false then
> we will
> > > > > > > > immediately return true if the pause is requested?  I agree
> that it is
> > > > > > > > good to have an API to know whether the recovery pause is
> requested or
> > > > > > > > not but I am not sure is it good idea to make this API serve
> both the
> > > > > > > > purpose?  Anyone else have any thoughts on this?
> > > > > > > >
> > > > > >
> > > > > > I think the current pg_is_wal_replay_paused() already has
> another purpose;
> > > > > > this waits recovery to actually get paused. If we want to limit
> this API's
> > > > > > purpose only to return the pause state, it seems better to fix
> this to return
> > > > > > the actual state at the cost of lacking the backward
> compatibility. If we want
> > > > > > to know whether pause is requested, we may add a new API like
> > > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait
> recovery to actually
> > > > > > get paused, we may add an option to pg_wal_replay_pause() for
> this purpose.
> > > > > >
> > > > > > However, this might be a bikeshedding. If anyone don't care that
> > > > > > pg_is_wal_replay_paused() can make user wait for a long time, I
> don't care either.
> > > > >
> > > > > I don't think that it will be blocked ever, because
> > > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the
> > > > > recovery process will not be stuck on waiting for the WAL.
> > >
> > > Yes, there is no stuck on waiting for the WAL. However, it can be
> stuck during resolving
> > > a recovery conflict. The process could wait for
> max_standby_streaming_delay or
> > > max_standby_archive_delay at most before recovery get completely
> paused.
> >
> > Okay, I agree that it is possible so for handling this we have a
> > couple of options
> > 1. pg_is_wal_replay_paused(), interface will wait for recovery to
> > actually get paused, but user have an option to cancel that.  So I
> > agree that there is currently no option to just know that recovery
> > pause is requested without waiting for its actually get paused if it
> > is requested.  So one option is we can provide an another interface as
> > you mentioned pg_is_wal_replay_paluse_requeseted(), which can just
> > return the request status.  I am not sure how useful it is.
>
> If it is acceptable that pg_is_wal_replay_paused() makes users wait,
> I'm ok for the current interface. I don't feel the need of
> pg_is_wal_replay_paluse_requeseted().
>
> >
> > 2. Pass an option to pg_is_wal_replay_paused whether to wait for
> > recovery to actually get paused or not.
> >
> > 3. Pass an option to pg_wal_replay_pause(), whether to wait for
> > recovery pause or just request and return.
> >
> > I like the option 1, any other opinion on this?
> >
> > > Also, it could wait for recovery_min_apply_delay if it has a valid
> value. It is possible
> > > that a user set this parameter to a large value, so it could wait for
> a long time. However,
> > > this will be avoided by calling recoveryPausesHere() or
> 

Re: Is Recovery actually paused?

2021-01-18 Thread Yugo NAGATA
On Sun, 17 Jan 2021 11:33:52 +0530
Dilip Kumar  wrote:

> On Thu, Jan 14, 2021 at 6:49 PM Yugo NAGATA  wrote:
> >
> > On Wed, 13 Jan 2021 17:49:43 +0530
> > Dilip Kumar  wrote:
> >
> > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar  wrote:
> > > >
> > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA  wrote:
> > > > >
> > > > > On Thu, 10 Dec 2020 11:25:23 +0530
> > > > > Dilip Kumar  wrote:
> > > > >
> > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to 
> > > > > > > > wait.
> > > > > > > > Especially, if max_standby_streaming_delay is -1, this will be 
> > > > > > > > blocked forever,
> > > > > > > > although this setting may not be usual. In addition, some users 
> > > > > > > > may set
> > > > > > > > recovery_min_apply_delay for a large.  If such users call 
> > > > > > > > pg_is_wal_replay_paused,
> > > > > > > > it could wait for a long time.
> > > > > > > >
> > > > > > > > At least, I think we need some descriptions on document to 
> > > > > > > > explain
> > > > > > > > pg_is_wal_replay_paused could wait while a time.
> > > > > > >
> > > > > > > Ok
> > > > > >
> > > > > > Fixed this, added some comments in .sgml as well as in function 
> > > > > > header
> > > > >
> > > > > Thank you for fixing this.
> > > > >
> > > > > Also, is it better to fix the description of pg_wal_replay_pause from
> > > > > "Pauses recovery." to "Request to pause recovery." in according with
> > > > > pg_is_wal_replay_paused?
> > > >
> > > > Okay
> > > >
> > > > >
> > > > > > > > Also, how about adding a new boolean argument to 
> > > > > > > > pg_is_wal_replay_paused to
> > > > > > > > control whether this waits for recovery to get paused or not? 
> > > > > > > > By setting its
> > > > > > > > default value to true or false, users can use the old format 
> > > > > > > > for calling this
> > > > > > > > and the backward compatibility can be maintained.
> > > > > > >
> > > > > > > So basically, if the wait_recovery_pause flag is false then we 
> > > > > > > will
> > > > > > > immediately return true if the pause is requested?  I agree that 
> > > > > > > it is
> > > > > > > good to have an API to know whether the recovery pause is 
> > > > > > > requested or
> > > > > > > not but I am not sure is it good idea to make this API serve both 
> > > > > > > the
> > > > > > > purpose?  Anyone else have any thoughts on this?
> > > > > > >
> > > > >
> > > > > I think the current pg_is_wal_replay_paused() already has another 
> > > > > purpose;
> > > > > this waits recovery to actually get paused. If we want to limit this 
> > > > > API's
> > > > > purpose only to return the pause state, it seems better to fix this 
> > > > > to return
> > > > > the actual state at the cost of lacking the backward compatibility. 
> > > > > If we want
> > > > > to know whether pause is requested, we may add a new API like
> > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait 
> > > > > recovery to actually
> > > > > get paused, we may add an option to pg_wal_replay_pause() for this 
> > > > > purpose.
> > > > >
> > > > > However, this might be a bikeshedding. If anyone don't care that
> > > > > pg_is_wal_replay_paused() can make user wait for a long time, I don't 
> > > > > care either.
> > > >
> > > > I don't think that it will be blocked ever, because
> > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the
> > > > recovery process will not be stuck on waiting for the WAL.
> >
> > Yes, there is no stuck on waiting for the WAL. However, it can be stuck 
> > during resolving
> > a recovery conflict. The process could wait for max_standby_streaming_delay 
> > or
> > max_standby_archive_delay at most before recovery get completely paused.
> 
> Okay, I agree that it is possible so for handling this we have a
> couple of options
> 1. pg_is_wal_replay_paused(), interface will wait for recovery to
> actually get paused, but user have an option to cancel that.  So I
> agree that there is currently no option to just know that recovery
> pause is requested without waiting for its actually get paused if it
> is requested.  So one option is we can provide an another interface as
> you mentioned pg_is_wal_replay_paluse_requeseted(), which can just
> return the request status.  I am not sure how useful it is.

If it is acceptable that pg_is_wal_replay_paused() makes users wait, 
I'm ok for the current interface. I don't feel the need of
pg_is_wal_replay_paluse_requeseted().

> 
> 2. Pass an option to pg_is_wal_replay_paused whether to wait for
> recovery to actually get paused or not.
> 
> 3. Pass an option to pg_wal_replay_pause(), whether to wait for
> recovery pause or just request and return.
> 
> I like the option 1, any other opinion on this?
> 
> > Also, it could wait for recovery_min_apply_delay if it has a valid value. 
> > It is possible
> > that a user set this parameter to a large value, so it could wait for a 
> > long time. However,
> > this will be avoided by calling recoveryPausesHere() 

Re: Is Recovery actually paused?

2021-01-17 Thread Dilip Kumar
On Sun, Jan 17, 2021 at 1:48 PM Dilip Kumar  wrote:
>
> On Sat, Jan 16, 2021 at 8:59 AM Masahiko Sawada  wrote:
> >
> > On Wed, Jan 13, 2021 at 9:20 PM Dilip Kumar  wrote:
> > >
> > > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar  wrote:
> > > >
> > > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA  wrote:
> > > > >
> > > > > On Thu, 10 Dec 2020 11:25:23 +0530
> > > > > Dilip Kumar  wrote:
> > > > >
> > > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to 
> > > > > > > > wait.
> > > > > > > > Especially, if max_standby_streaming_delay is -1, this will be 
> > > > > > > > blocked forever,
> > > > > > > > although this setting may not be usual. In addition, some users 
> > > > > > > > may set
> > > > > > > > recovery_min_apply_delay for a large.  If such users call 
> > > > > > > > pg_is_wal_replay_paused,
> > > > > > > > it could wait for a long time.
> > > > > > > >
> > > > > > > > At least, I think we need some descriptions on document to 
> > > > > > > > explain
> > > > > > > > pg_is_wal_replay_paused could wait while a time.
> > > > > > >
> > > > > > > Ok
> > > > > >
> > > > > > Fixed this, added some comments in .sgml as well as in function 
> > > > > > header
> > > > >
> > > > > Thank you for fixing this.
> > > > >
> > > > > Also, is it better to fix the description of pg_wal_replay_pause from
> > > > > "Pauses recovery." to "Request to pause recovery." in according with
> > > > > pg_is_wal_replay_paused?
> > > >
> > > > Okay
> > > >
> > > > >
> > > > > > > > Also, how about adding a new boolean argument to 
> > > > > > > > pg_is_wal_replay_paused to
> > > > > > > > control whether this waits for recovery to get paused or not? 
> > > > > > > > By setting its
> > > > > > > > default value to true or false, users can use the old format 
> > > > > > > > for calling this
> > > > > > > > and the backward compatibility can be maintained.
> > > > > > >
> > > > > > > So basically, if the wait_recovery_pause flag is false then we 
> > > > > > > will
> > > > > > > immediately return true if the pause is requested?  I agree that 
> > > > > > > it is
> > > > > > > good to have an API to know whether the recovery pause is 
> > > > > > > requested or
> > > > > > > not but I am not sure is it good idea to make this API serve both 
> > > > > > > the
> > > > > > > purpose?  Anyone else have any thoughts on this?
> > > > > > >
> > > > >
> > > > > I think the current pg_is_wal_replay_paused() already has another 
> > > > > purpose;
> > > > > this waits recovery to actually get paused. If we want to limit this 
> > > > > API's
> > > > > purpose only to return the pause state, it seems better to fix this 
> > > > > to return
> > > > > the actual state at the cost of lacking the backward compatibility. 
> > > > > If we want
> > > > > to know whether pause is requested, we may add a new API like
> > > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait 
> > > > > recovery to actually
> > > > > get paused, we may add an option to pg_wal_replay_pause() for this 
> > > > > purpose.
> > > > >
> > > > > However, this might be a bikeshedding. If anyone don't care that
> > > > > pg_is_wal_replay_paused() can make user wait for a long time, I don't 
> > > > > care either.
> > > >
> > > > I don't think that it will be blocked ever, because
> > > > pg_wal_replay_pause is sending the WakeupRecovery() which means the
> > > > recovery process will not be stuck on waiting for the WAL.
> > > >
> > > > > > > > As another comment, while pg_is_wal_replay_paused is blocking, 
> > > > > > > > I can not cancel
> > > > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the 
> > > > > > > > waiting loop.
> > > > >
> > > > > How about this fix? I think users may want to cancel 
> > > > > pg_is_wal_replay_paused() during
> > > > > this is blocking.
> > > >
> > > > Yeah, we can do this.  I will send the updated patch after putting
> > > > some more thought into these comments.  Thanks again for the feedback.
> > > >
> > >
> > > Please find the updated patch.
> >
> > I've looked at the patch. Here are review comments:
> >
> > +   /* Recovery pause state */
> > +   RecoveryPauseState  recoveryPause;
> >
> > Now that the value can have tri-state, how about renaming it to
> > recoveryPauseState?
>
> This makes sense to me.
>
> > ---
> >  bool
> >  RecoveryIsPaused(void)
> > +{
> > +   boolrecoveryPause;
> > +
> > +   SpinLockAcquire(>info_lck);
> > +   recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED) ?
> > true : false;
> > +   SpinLockRelease(>info_lck);
> > +
> > +   return recoveryPause;
> > +}
> > +
> > +bool
> > +RecoveryPauseRequested(void)
> >  {
> > boolrecoveryPause;
> >
> > SpinLockAcquire(>info_lck);
> > -   recoveryPause = XLogCtl->recoveryPause;
> > +   recoveryPause = (XLogCtl->recoveryPause !=
> > RECOVERY_IN_PROGRESS) ? true : false;
> > SpinLockRelease(>info_lck);
> >
> > return 

Re: Is Recovery actually paused?

2021-01-17 Thread Dilip Kumar
On Sat, Jan 16, 2021 at 8:59 AM Masahiko Sawada  wrote:
>
> On Wed, Jan 13, 2021 at 9:20 PM Dilip Kumar  wrote:
> >
> > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar  wrote:
> > >
> > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA  wrote:
> > > >
> > > > On Thu, 10 Dec 2020 11:25:23 +0530
> > > > Dilip Kumar  wrote:
> > > >
> > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to 
> > > > > > > wait.
> > > > > > > Especially, if max_standby_streaming_delay is -1, this will be 
> > > > > > > blocked forever,
> > > > > > > although this setting may not be usual. In addition, some users 
> > > > > > > may set
> > > > > > > recovery_min_apply_delay for a large.  If such users call 
> > > > > > > pg_is_wal_replay_paused,
> > > > > > > it could wait for a long time.
> > > > > > >
> > > > > > > At least, I think we need some descriptions on document to explain
> > > > > > > pg_is_wal_replay_paused could wait while a time.
> > > > > >
> > > > > > Ok
> > > > >
> > > > > Fixed this, added some comments in .sgml as well as in function header
> > > >
> > > > Thank you for fixing this.
> > > >
> > > > Also, is it better to fix the description of pg_wal_replay_pause from
> > > > "Pauses recovery." to "Request to pause recovery." in according with
> > > > pg_is_wal_replay_paused?
> > >
> > > Okay
> > >
> > > >
> > > > > > > Also, how about adding a new boolean argument to 
> > > > > > > pg_is_wal_replay_paused to
> > > > > > > control whether this waits for recovery to get paused or not? By 
> > > > > > > setting its
> > > > > > > default value to true or false, users can use the old format for 
> > > > > > > calling this
> > > > > > > and the backward compatibility can be maintained.
> > > > > >
> > > > > > So basically, if the wait_recovery_pause flag is false then we will
> > > > > > immediately return true if the pause is requested?  I agree that it 
> > > > > > is
> > > > > > good to have an API to know whether the recovery pause is requested 
> > > > > > or
> > > > > > not but I am not sure is it good idea to make this API serve both 
> > > > > > the
> > > > > > purpose?  Anyone else have any thoughts on this?
> > > > > >
> > > >
> > > > I think the current pg_is_wal_replay_paused() already has another 
> > > > purpose;
> > > > this waits recovery to actually get paused. If we want to limit this 
> > > > API's
> > > > purpose only to return the pause state, it seems better to fix this to 
> > > > return
> > > > the actual state at the cost of lacking the backward compatibility. If 
> > > > we want
> > > > to know whether pause is requested, we may add a new API like
> > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery 
> > > > to actually
> > > > get paused, we may add an option to pg_wal_replay_pause() for this 
> > > > purpose.
> > > >
> > > > However, this might be a bikeshedding. If anyone don't care that
> > > > pg_is_wal_replay_paused() can make user wait for a long time, I don't 
> > > > care either.
> > >
> > > I don't think that it will be blocked ever, because
> > > pg_wal_replay_pause is sending the WakeupRecovery() which means the
> > > recovery process will not be stuck on waiting for the WAL.
> > >
> > > > > > > As another comment, while pg_is_wal_replay_paused is blocking, I 
> > > > > > > can not cancel
> > > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the 
> > > > > > > waiting loop.
> > > >
> > > > How about this fix? I think users may want to cancel 
> > > > pg_is_wal_replay_paused() during
> > > > this is blocking.
> > >
> > > Yeah, we can do this.  I will send the updated patch after putting
> > > some more thought into these comments.  Thanks again for the feedback.
> > >
> >
> > Please find the updated patch.
>
> I've looked at the patch. Here are review comments:
>
> +   /* Recovery pause state */
> +   RecoveryPauseState  recoveryPause;
>
> Now that the value can have tri-state, how about renaming it to
> recoveryPauseState?

This makes sense to me.

> ---
>  bool
>  RecoveryIsPaused(void)
> +{
> +   boolrecoveryPause;
> +
> +   SpinLockAcquire(>info_lck);
> +   recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED) ?
> true : false;
> +   SpinLockRelease(>info_lck);
> +
> +   return recoveryPause;
> +}
> +
> +bool
> +RecoveryPauseRequested(void)
>  {
> boolrecoveryPause;
>
> SpinLockAcquire(>info_lck);
> -   recoveryPause = XLogCtl->recoveryPause;
> +   recoveryPause = (XLogCtl->recoveryPause !=
> RECOVERY_IN_PROGRESS) ? true : false;
> SpinLockRelease(>info_lck);
>
> return recoveryPause;
>  }
>
> We can write like recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED);

In RecoveryPauseRequested, we just want to know whether the pause is
requested or not, even if the pause requested and not yet pause then
also we want to return true. So how
recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED) will work?

> 

Re: Is Recovery actually paused?

2021-01-16 Thread Dilip Kumar
On Thu, Jan 14, 2021 at 6:49 PM Yugo NAGATA  wrote:
>
> On Wed, 13 Jan 2021 17:49:43 +0530
> Dilip Kumar  wrote:
>
> > On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar  wrote:
> > >
> > > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA  wrote:
> > > >
> > > > On Thu, 10 Dec 2020 11:25:23 +0530
> > > > Dilip Kumar  wrote:
> > > >
> > > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to 
> > > > > > > wait.
> > > > > > > Especially, if max_standby_streaming_delay is -1, this will be 
> > > > > > > blocked forever,
> > > > > > > although this setting may not be usual. In addition, some users 
> > > > > > > may set
> > > > > > > recovery_min_apply_delay for a large.  If such users call 
> > > > > > > pg_is_wal_replay_paused,
> > > > > > > it could wait for a long time.
> > > > > > >
> > > > > > > At least, I think we need some descriptions on document to explain
> > > > > > > pg_is_wal_replay_paused could wait while a time.
> > > > > >
> > > > > > Ok
> > > > >
> > > > > Fixed this, added some comments in .sgml as well as in function header
> > > >
> > > > Thank you for fixing this.
> > > >
> > > > Also, is it better to fix the description of pg_wal_replay_pause from
> > > > "Pauses recovery." to "Request to pause recovery." in according with
> > > > pg_is_wal_replay_paused?
> > >
> > > Okay
> > >
> > > >
> > > > > > > Also, how about adding a new boolean argument to 
> > > > > > > pg_is_wal_replay_paused to
> > > > > > > control whether this waits for recovery to get paused or not? By 
> > > > > > > setting its
> > > > > > > default value to true or false, users can use the old format for 
> > > > > > > calling this
> > > > > > > and the backward compatibility can be maintained.
> > > > > >
> > > > > > So basically, if the wait_recovery_pause flag is false then we will
> > > > > > immediately return true if the pause is requested?  I agree that it 
> > > > > > is
> > > > > > good to have an API to know whether the recovery pause is requested 
> > > > > > or
> > > > > > not but I am not sure is it good idea to make this API serve both 
> > > > > > the
> > > > > > purpose?  Anyone else have any thoughts on this?
> > > > > >
> > > >
> > > > I think the current pg_is_wal_replay_paused() already has another 
> > > > purpose;
> > > > this waits recovery to actually get paused. If we want to limit this 
> > > > API's
> > > > purpose only to return the pause state, it seems better to fix this to 
> > > > return
> > > > the actual state at the cost of lacking the backward compatibility. If 
> > > > we want
> > > > to know whether pause is requested, we may add a new API like
> > > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery 
> > > > to actually
> > > > get paused, we may add an option to pg_wal_replay_pause() for this 
> > > > purpose.
> > > >
> > > > However, this might be a bikeshedding. If anyone don't care that
> > > > pg_is_wal_replay_paused() can make user wait for a long time, I don't 
> > > > care either.
> > >
> > > I don't think that it will be blocked ever, because
> > > pg_wal_replay_pause is sending the WakeupRecovery() which means the
> > > recovery process will not be stuck on waiting for the WAL.
>
> Yes, there is no stuck on waiting for the WAL. However, it can be stuck 
> during resolving
> a recovery conflict. The process could wait for max_standby_streaming_delay or
> max_standby_archive_delay at most before recovery get completely paused.

Okay, I agree that it is possible so for handling this we have a
couple of options
1. pg_is_wal_replay_paused(), interface will wait for recovery to
actually get paused, but user have an option to cancel that.  So I
agree that there is currently no option to just know that recovery
pause is requested without waiting for its actually get paused if it
is requested.  So one option is we can provide an another interface as
you mentioned pg_is_wal_replay_paluse_requeseted(), which can just
return the request status.  I am not sure how useful it is.

2. Pass an option to pg_is_wal_replay_paused whether to wait for
recovery to actually get paused or not.

3. Pass an option to pg_wal_replay_pause(), whether to wait for
recovery pause or just request and return.

I like the option 1, any other opinion on this?

> Also, it could wait for recovery_min_apply_delay if it has a valid value. It 
> is possible
> that a user set this parameter to a large value, so it could wait for a long 
> time. However,
> this will be avoided by calling recoveryPausesHere() or 
> CheckAndSetRecoveryPause() in
> recoveryApplyDelay().

Right

> > > > > > > As another comment, while pg_is_wal_replay_paused is blocking, I 
> > > > > > > can not cancel
> > > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the 
> > > > > > > waiting loop.
> > > >
> > > > How about this fix? I think users may want to cancel 
> > > > pg_is_wal_replay_paused() during
> > > > this is blocking.
> > >
> > > Yeah, we can do this.  I will send the updated patch 

Re: Is Recovery actually paused?

2021-01-16 Thread Dilip Kumar
On Sun, Jan 17, 2021 at 3:52 AM Masahiko Sawada  wrote:
>
> On Sat, Jan 16, 2021 at 12:28 PM Masahiko Sawada  
> wrote:
> >
> > ---
> > +   /* test for recovery pause if user has requested the pause */
> > +   if (((volatile XLogCtlData *) XLogCtl)->recoveryPause)
> > +   recoveryPausesHere(false);
> > +
> > +   now = GetCurrentTimestamp();
> > +
> >
> > Hmm, if the recovery pauses here, the wal receiver isn't launched even
> > when wal_retrieve_retry_interval has passed, which seems not good. I
> > think we want the recovery to be paused but want the wal receiver to
> > continue receiving WAL.
>
> I had misunderstood the code and the patch, please ignore this
> comment. Pausing the recovery here is fine with me.

Thanks for the review Sawada-San,  I will work on your other comments
and post the patch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Is Recovery actually paused?

2021-01-16 Thread Masahiko Sawada
On Sat, Jan 16, 2021 at 12:28 PM Masahiko Sawada  wrote:
>
> ---
> +   /* test for recovery pause if user has requested the pause */
> +   if (((volatile XLogCtlData *) XLogCtl)->recoveryPause)
> +   recoveryPausesHere(false);
> +
> +   now = GetCurrentTimestamp();
> +
>
> Hmm, if the recovery pauses here, the wal receiver isn't launched even
> when wal_retrieve_retry_interval has passed, which seems not good. I
> think we want the recovery to be paused but want the wal receiver to
> continue receiving WAL.

I had misunderstood the code and the patch, please ignore this
comment. Pausing the recovery here is fine with me.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/




Re: Is Recovery actually paused?

2021-01-15 Thread Masahiko Sawada
On Wed, Jan 13, 2021 at 9:20 PM Dilip Kumar  wrote:
>
> On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar  wrote:
> >
> > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA  wrote:
> > >
> > > On Thu, 10 Dec 2020 11:25:23 +0530
> > > Dilip Kumar  wrote:
> > >
> > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to 
> > > > > > wait.
> > > > > > Especially, if max_standby_streaming_delay is -1, this will be 
> > > > > > blocked forever,
> > > > > > although this setting may not be usual. In addition, some users may 
> > > > > > set
> > > > > > recovery_min_apply_delay for a large.  If such users call 
> > > > > > pg_is_wal_replay_paused,
> > > > > > it could wait for a long time.
> > > > > >
> > > > > > At least, I think we need some descriptions on document to explain
> > > > > > pg_is_wal_replay_paused could wait while a time.
> > > > >
> > > > > Ok
> > > >
> > > > Fixed this, added some comments in .sgml as well as in function header
> > >
> > > Thank you for fixing this.
> > >
> > > Also, is it better to fix the description of pg_wal_replay_pause from
> > > "Pauses recovery." to "Request to pause recovery." in according with
> > > pg_is_wal_replay_paused?
> >
> > Okay
> >
> > >
> > > > > > Also, how about adding a new boolean argument to 
> > > > > > pg_is_wal_replay_paused to
> > > > > > control whether this waits for recovery to get paused or not? By 
> > > > > > setting its
> > > > > > default value to true or false, users can use the old format for 
> > > > > > calling this
> > > > > > and the backward compatibility can be maintained.
> > > > >
> > > > > So basically, if the wait_recovery_pause flag is false then we will
> > > > > immediately return true if the pause is requested?  I agree that it is
> > > > > good to have an API to know whether the recovery pause is requested or
> > > > > not but I am not sure is it good idea to make this API serve both the
> > > > > purpose?  Anyone else have any thoughts on this?
> > > > >
> > >
> > > I think the current pg_is_wal_replay_paused() already has another purpose;
> > > this waits recovery to actually get paused. If we want to limit this API's
> > > purpose only to return the pause state, it seems better to fix this to 
> > > return
> > > the actual state at the cost of lacking the backward compatibility. If we 
> > > want
> > > to know whether pause is requested, we may add a new API like
> > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery 
> > > to actually
> > > get paused, we may add an option to pg_wal_replay_pause() for this 
> > > purpose.
> > >
> > > However, this might be a bikeshedding. If anyone don't care that
> > > pg_is_wal_replay_paused() can make user wait for a long time, I don't 
> > > care either.
> >
> > I don't think that it will be blocked ever, because
> > pg_wal_replay_pause is sending the WakeupRecovery() which means the
> > recovery process will not be stuck on waiting for the WAL.
> >
> > > > > > As another comment, while pg_is_wal_replay_paused is blocking, I 
> > > > > > can not cancel
> > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the 
> > > > > > waiting loop.
> > >
> > > How about this fix? I think users may want to cancel 
> > > pg_is_wal_replay_paused() during
> > > this is blocking.
> >
> > Yeah, we can do this.  I will send the updated patch after putting
> > some more thought into these comments.  Thanks again for the feedback.
> >
>
> Please find the updated patch.

I've looked at the patch. Here are review comments:

+   /* Recovery pause state */
+   RecoveryPauseState  recoveryPause;

Now that the value can have tri-state, how about renaming it to
recoveryPauseState?

---
 bool
 RecoveryIsPaused(void)
+{
+   boolrecoveryPause;
+
+   SpinLockAcquire(>info_lck);
+   recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED) ?
true : false;
+   SpinLockRelease(>info_lck);
+
+   return recoveryPause;
+}
+
+bool
+RecoveryPauseRequested(void)
 {
boolrecoveryPause;

SpinLockAcquire(>info_lck);
-   recoveryPause = XLogCtl->recoveryPause;
+   recoveryPause = (XLogCtl->recoveryPause !=
RECOVERY_IN_PROGRESS) ? true : false;
SpinLockRelease(>info_lck);

return recoveryPause;
 }

We can write like recoveryPause = (XLogCtl->recoveryPause == RECOVERY_PAUSED);

Also, since these functions do the almost same thing, I think we can
have a common function to get XLogCtl->recoveryPause, say
GetRecoveryPauseState() or GetRecoveryPause(), and both
RecoveryIsPaused() and RecoveryPauseRequested() use the returned
value. What do you think?

---
+static void
+CheckAndSetRecoveryPause(void)

Maybe we need to declare the prototype of this function like other
functions in xlog.c.

---
+   /*
+* If recovery is not in progress anymore then report an error this
+* could happen if the standby is promoted while we were waiting for
+* recovery to get paused.
+

Re: Is Recovery actually paused?

2021-01-14 Thread Yugo NAGATA
On Wed, 13 Jan 2021 17:49:43 +0530
Dilip Kumar  wrote:

> On Wed, Jan 13, 2021 at 3:35 PM Dilip Kumar  wrote:
> >
> > On Wed, Jan 13, 2021 at 3:27 PM Yugo NAGATA  wrote:
> > >
> > > On Thu, 10 Dec 2020 11:25:23 +0530
> > > Dilip Kumar  wrote:
> > >
> > > > > > However, I wonder users don't expect pg_is_wal_replay_paused to 
> > > > > > wait.
> > > > > > Especially, if max_standby_streaming_delay is -1, this will be 
> > > > > > blocked forever,
> > > > > > although this setting may not be usual. In addition, some users may 
> > > > > > set
> > > > > > recovery_min_apply_delay for a large.  If such users call 
> > > > > > pg_is_wal_replay_paused,
> > > > > > it could wait for a long time.
> > > > > >
> > > > > > At least, I think we need some descriptions on document to explain
> > > > > > pg_is_wal_replay_paused could wait while a time.
> > > > >
> > > > > Ok
> > > >
> > > > Fixed this, added some comments in .sgml as well as in function header
> > >
> > > Thank you for fixing this.
> > >
> > > Also, is it better to fix the description of pg_wal_replay_pause from
> > > "Pauses recovery." to "Request to pause recovery." in according with
> > > pg_is_wal_replay_paused?
> >
> > Okay
> >
> > >
> > > > > > Also, how about adding a new boolean argument to 
> > > > > > pg_is_wal_replay_paused to
> > > > > > control whether this waits for recovery to get paused or not? By 
> > > > > > setting its
> > > > > > default value to true or false, users can use the old format for 
> > > > > > calling this
> > > > > > and the backward compatibility can be maintained.
> > > > >
> > > > > So basically, if the wait_recovery_pause flag is false then we will
> > > > > immediately return true if the pause is requested?  I agree that it is
> > > > > good to have an API to know whether the recovery pause is requested or
> > > > > not but I am not sure is it good idea to make this API serve both the
> > > > > purpose?  Anyone else have any thoughts on this?
> > > > >
> > >
> > > I think the current pg_is_wal_replay_paused() already has another purpose;
> > > this waits recovery to actually get paused. If we want to limit this API's
> > > purpose only to return the pause state, it seems better to fix this to 
> > > return
> > > the actual state at the cost of lacking the backward compatibility. If we 
> > > want
> > > to know whether pause is requested, we may add a new API like
> > > pg_is_wal_replay_paluse_requeseted(). Also, if we want to wait recovery 
> > > to actually
> > > get paused, we may add an option to pg_wal_replay_pause() for this 
> > > purpose.
> > >
> > > However, this might be a bikeshedding. If anyone don't care that
> > > pg_is_wal_replay_paused() can make user wait for a long time, I don't 
> > > care either.
> >
> > I don't think that it will be blocked ever, because
> > pg_wal_replay_pause is sending the WakeupRecovery() which means the
> > recovery process will not be stuck on waiting for the WAL.

Yes, there is no stuck on waiting for the WAL. However, it can be stuck during 
resolving
a recovery conflict. The process could wait for max_standby_streaming_delay or
max_standby_archive_delay at most before recovery get completely paused.

Also, it could wait for recovery_min_apply_delay if it has a valid value. It is 
possible
that a user set this parameter to a large value, so it could wait for a long 
time. However,
this will be avoided by calling recoveryPausesHere() or 
CheckAndSetRecoveryPause() in
recoveryApplyDelay().

> > > > > > As another comment, while pg_is_wal_replay_paused is blocking, I 
> > > > > > can not cancel
> > > > > > the query. I think CHECK_FOR_INTERRUPTS() is necessary in the 
> > > > > > waiting loop.
> > >
> > > How about this fix? I think users may want to cancel 
> > > pg_is_wal_replay_paused() during
> > > this is blocking.
> >
> > Yeah, we can do this.  I will send the updated patch after putting
> > some more thought into these comments.  Thanks again for the feedback.
> >
> 
> Please find the updated patch.

Thanks.  I confirmed that I can cancel pg_is_wal_repaly_paused() during stuck.


Although it is a very trivial comment, I think that the new line before
HandleStartupProcInterrupts() is unnecessary.

@@ -6052,12 +6062,20 @@ recoveryPausesHere(bool endOfRecovery)
(errmsg("recovery has paused"),
 errhint("Execute pg_wal_replay_resume() to 
continue.")));
 
-   while (RecoveryIsPaused())
+   while (RecoveryPauseRequested())
{
+
HandleStartupProcInterrupts();


Regards,
Yugo Nagata

-- 
Yugo NAGATA 




  1   2   >