At Mon, 25 Jan 2021 10:05:19 +0530, Dilip Kumar <dilipbal...@gmail.com> wrote in > On Mon, Jan 25, 2021 at 8:42 AM Kyotaro Horiguchi > <horikyota....@gmail.com> wrote: > > > > At Sun, 24 Jan 2021 14:26:08 +0530, Dilip Kumar <dilipbal...@gmail.com> > > wrote in > > > On Sun, Jan 24, 2021 at 12:16 PM Bharath Rupireddy > > > <bharath.rupireddyforpostg...@gmail.com> wrote: > > > > > > > > On Sun, Jan 24, 2021 at 7:17 AM Dilip Kumar <dilipbal...@gmail.com> > > > > wrote: > > > > > On Sat, 23 Jan 2021 at 4:40 PM, Bharath Rupireddy > > > > > <bharath.rupireddyforpostg...@gmail.com> wrote: > > > > >> > > > > >> On Sat, Jan 23, 2021 at 1:36 PM Dilip Kumar <dilipbal...@gmail.com> > > > > >> wrote: > > > > >> > Please find the patch for the same. I haven't added a test case > > > > >> > for > > > > >> > this yet. I mean we can write a test case to pause the recovery > > > > >> > and > > > > >> > get the status. But I am not sure that we can really write a > > > > >> > reliable > > > > >> > test case for 'pause requested' and 'paused'. > > > > >> > > > > >> +1 to just show the recovery pause state in the output of > > > > >> pg_is_wal_replay_paused. But, should the function name > > > > >> "pg_is_wal_replay_paused" be something like > > > > >> "pg_get_wal_replay_pause_state" or some other? To me, when "is" > > > > >> exists > > > > >> in a function, I expect a boolean output. Others may have better > > > > >> thoughts. > > > > >> > > > > >> IIUC the above change, ensuring the recovery is paused after it's > > > > >> requested lies with the user. IMHO, the main problem we are trying to > > > > >> solve is not met > > > > > > > > > > > > > > > Basically earlier their was no way for the user yo know whether the > > > > > recovery is actually paused or not because it was always returning > > > > > true after pause requested. Now, we will return whether pause > > > > > requested or actually paused. So > for tool designer who want to > > > > > wait for recovery to get paused can have a loop and wait until the > > > > > recovery state reaches to paused. That will give a better control. > > > > > > > > I get it and I agree to have that change. My point was whether we can > > > > have a new function pg_wal_replay_pause_and_wait that waits until > > > > recovery is actually paused ((along with pg_is_wal_replay_paused > > > > returning the actual state than a true/false) so that tool developers > > > > don't need to have the waiting code outside, if at all they care about > > > > it? Others may have better thoughts than me. > > > > > > I think the previous patch was based on that idea where we thought > > > that we can pass an argument to pg_is_wal_replay_paused which can > > > decide whether to wait or return without the wait. I think this > > > version looks better to me where we give the status instead of > > > waiting. I am not sure whether we want another version of > > > pg_wal_replay_pause which will wait for actually it to get paused. I > > > mean there is always a scope to include the functionality in the > > > database which can be achieved by the tool, but this patch was trying > > > to solve the problem that there was no way to know the status so I > > > think returning the correct status should be the scope of this. > > > > I understand that the requirement here is that no record is replayed > > after pg_wal_replay_pause() is returned, or pg_is_wal_replay_paused() > > returns true, and delays taken while recovery don't delay the state > > change. That requirements are really synchronous. > > > > On the other hand the machinery is designed to be asynchronous. > > > > > * Note that we intentionally don't take the info_lck spinlock > > > * here. We might therefore read a slightly stale value of > > > * the recoveryPause flag, but it can't be very stale (no > > > * worse than the last spinlock we did acquire). Since a > > > * pause request is a pretty asynchronous thing anyway, > > > * possibly responding to it one WAL record later than we > > > * otherwise would is a minor issue, so it doesn't seem worth > > > * adding another spinlock cycle to prevent that. > > > > As the result, this patch tries to introduce several new checkpoints > > to some delaying point so that that waits can find pause request in a > > timely manner. I think we had better use locking (or atomics) for the > > information instead of such scattered checkpoints if we expect that > > machinery to work in a such syhcnronous manner. > > > > That would make the tri-state state variable and many checkpoints > > unnecessary. Maybe. > > I don't think the intention was so to make it synchronous, I think > the main intention was that pg_is_wal_replay_paused can return us the > correct state, in short user can know that whether any more wal will > be replayed after pg_is_wal_replay_paused returns true or some other > state. I agree that along with that we have also introduced some
I meant that kind of correctness in a time-series by using the word "synchronous". So it can be implemented both by adopting many checkpoints and by just makeing the state-change synchronous. > extra checkpoint where the recovery process is waiting for WAL and > apply delay and from the pg_wal_replay_pause we had sent a signal to > wakeup the recovery process. So I am not sure is it worth adding the > lock/atomic variable to make this synchronous. Any other thoughts on > this? +1 There're only one reader process (startup) and at-most (in the sane usage) one writer process (the caller to pg_wal_replay_pause) so the chance of conflicting is negligibely low. However, I'm not sure how much penalty non-conflicted atomic updates/read imposes on performance.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center