At Mon, 25 Jan 2021 10:05:19 +0530, Dilip Kumar <dilipbal...@gmail.com> wrote 
in 
> On Mon, Jan 25, 2021 at 8:42 AM Kyotaro Horiguchi
> <horikyota....@gmail.com> wrote:
> >
> > At Sun, 24 Jan 2021 14:26:08 +0530, Dilip Kumar <dilipbal...@gmail.com> 
> > wrote in
> > > On Sun, Jan 24, 2021 at 12:16 PM Bharath Rupireddy
> > > <bharath.rupireddyforpostg...@gmail.com> wrote:
> > > >
> > > > On Sun, Jan 24, 2021 at 7:17 AM Dilip Kumar <dilipbal...@gmail.com> 
> > > > wrote:
> > > > > On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy 
> > > > > <bharath.rupireddyforpostg...@gmail.com> wrote:
> > > > >>
> > > > >> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar <dilipbal...@gmail.com> 
> > > > >> wrote:
> > > > >> > Please find the patch for the same.  I haven't added a test case 
> > > > >> > for
> > > > >> > this yet.  I mean we can write a test case to pause the recovery 
> > > > >> > and
> > > > >> > get the status.  But I am not sure that we can really write a 
> > > > >> > reliable
> > > > >> > test case for 'pause requested' and 'paused'.
> > > > >>
> > > > >> +1 to just show the recovery pause state in the output of
> > > > >> pg_is_wal_replay_paused. But, should the function name
> > > > >> "pg_is_wal_replay_paused" be something like
> > > > >> "pg_get_wal_replay_pause_state" or some other? To me, when "is" 
> > > > >> exists
> > > > >> in a function, I expect a boolean output. Others may have better
> > > > >> thoughts.
> > > > >>
> > > > >> IIUC the above change, ensuring the recovery is paused after it's
> > > > >> requested lies with the user. IMHO, the main problem we are trying to
> > > > >> solve is not met
> > > > >
> > > > >
> > > > > Basically earlier their was no way for the user yo know whether the 
> > > > > recovery is actually paused or not because it was always returning 
> > > > > true after pause requested.  Now, we will return whether pause 
> > > > > requested or actually paused.  So > for tool designer who want to 
> > > > > wait for recovery to get paused can have a loop and wait until the 
> > > > > recovery state reaches to paused.  That will give a better control.
> > > >
> > > > I get it and I agree to have that change. My point was whether we can
> > > > have a new function pg_wal_replay_pause_and_wait that waits until
> > > > recovery is actually paused ((along with pg_is_wal_replay_paused
> > > > returning the actual state than a true/false) so that tool developers
> > > > don't need to have the waiting code outside, if at all they care about
> > > > it? Others may have better thoughts than me.
> > >
> > > I think the previous patch was based on that idea where we thought
> > > that we can pass an argument to pg_is_wal_replay_paused which can
> > > decide whether to wait or return without the wait.  I think this
> > > version looks better to me where we give the status instead of
> > > waiting.  I am not sure whether we want another version of
> > > pg_wal_replay_pause which will wait for actually it to get paused.  I
> > > mean there is always a scope to include the functionality in the
> > > database which can be achieved by the tool, but this patch was trying
> > > to solve the problem that there was no way to know the status so I
> > > think returning the correct status should be the scope of this.
> >
> > I understand that the requirement here is that no record is replayed
> > after pg_wal_replay_pause() is returned, or pg_is_wal_replay_paused()
> > returns true, and delays taken while recovery don't delay the state
> > change. That requirements are really synchronous.
> >
> > On the other hand the machinery is designed to be asynchronous.
> >
> > >        * Note that we intentionally don't take the info_lck spinlock
> > >        * here.  We might therefore read a slightly stale value of
> > >        * the recoveryPause flag, but it can't be very stale (no
> > >        * worse than the last spinlock we did acquire).  Since a
> > >        * pause request is a pretty asynchronous thing anyway,
> > >        * possibly responding to it one WAL record later than we
> > >        * otherwise would is a minor issue, so it doesn't seem worth
> > >        * adding another spinlock cycle to prevent that.
> >
> > As the result, this patch tries to introduce several new checkpoints
> > to some delaying point so that that waits can find pause request in a
> > timely manner. I think we had better use locking (or atomics) for the
> > information instead of such scattered checkpoints if we expect that
> > machinery to work in a such syhcnronous manner.
> >
> > That would make the tri-state state variable and many checkpoints
> > unnecessary. Maybe.
> 
> I don't think the intention was so to make it synchronous,  I think
> the main intention was that pg_is_wal_replay_paused can return us the
> correct state, in short user can know that whether any more wal will
> be replayed after pg_is_wal_replay_paused returns true or some other
> state.  I agree that along with that we have also introduced some

I meant that kind of correctness in a time-series by using the word
"synchronous". So it can be implemented both by adopting many
checkpoints and by just makeing the state-change synchronous.

> extra checkpoint where the recovery process is waiting for WAL and
> apply delay and from the pg_wal_replay_pause we had sent a signal to
> wakeup the recovery process.  So I am not sure is it worth adding the

> lock/atomic variable to make this synchronous.  Any other thoughts on
> this?

+1

There're only one reader process (startup) and at-most (in the sane
usage) one writer process (the caller to pg_wal_replay_pause) so the
chance of conflicting is negligibely low.  However, I'm not sure how
much penalty non-conflicted atomic updates/read imposes on
performance..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Reply via email to