On 2020/03/26 14:33, Masahiko Sawada wrote:
On Tue, 24 Mar 2020 at 17:04, Fujii Masao <masao.fu...@oss.nttdata.com> wrote:
On 2020/03/05 20:16, Fujii Masao wrote:
On 2020/03/05 16:58, Masahiko Sawada wrote:
On Wed, 4 Mar 2020 at 15:21, Fujii Masao <masao.fu...@oss.nttdata.com> wrote:
On 2020/03/04 14:31, Masahiko Sawada wrote:
On Wed, 4 Mar 2020 at 13:48, Fujii Masao <masao.fu...@oss.nttdata.com> wrote:
On 2020/03/04 13:27, Michael Paquier wrote:
On Wed, Mar 04, 2020 at 01:13:19PM +0900, Masahiko Sawada wrote:
Yeah, so 0001 patch sets existing wait events to recovery conflict
resolution. For instance, it sets (PG_WAIT_LOCK | LOCKTAG_TRANSACTION)
to the recovery conflict on a snapshot. 0003 patch improves these wait
events by adding the new type of wait event such as
WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT. Therefore 0001 (and 0002) patch
is the fix for existing versions and 0003 patch is an improvement for
only PG13. Did you mean even 0001 patch doesn't fit for back-patching?
Yes, it looks like a improvement rather than bug fix.
Okay, understand.
I got my eyes on this patch set. The full patch set is in my opinion
just a set of improvements, and not bug fixes, so I would refrain from
back-backpatching.
I think that the issue (i.e., "waiting" is reported twice needlessly
in PS display) that 0002 patch tries to fix is a bug. So it should be
fixed even in the back branches.
So we need only two patches: one fixes process title issue and another
improve wait event. I've attached updated patches.
Thanks for updating the patches! I started reading 0001 patch.
Thank you for reviewing this patch.
- /*
- * Report via ps if we have been waiting for more than
500 msec
- * (should that be configurable?)
- */
- if (update_process_title && new_status == NULL &&
- TimestampDifferenceExceeds(waitStart,
GetCurrentTimestamp(),
-
500))
The patch changes ResolveRecoveryConflictWithSnapshot() and
ResolveRecoveryConflictWithTablespace() so that they always add
"waiting" into the PS display, whether wait is really necessary or not.
But isn't it better to display "waiting" in PS basically when wait is
necessary, like originally ResolveRecoveryConflictWithVirtualXIDs()
does as the above?
You're right. Will fix it.
ResolveRecoveryConflictWithDatabase(Oid dbid)
{
+ char *new_status = NULL;
+
+ /* Report via ps we are waiting */
+ new_status = set_process_title_waiting();
In ResolveRecoveryConflictWithDatabase(), there seems no need to
display "waiting" in PS because no wait occurs when recovery conflict
with database happens.
Isn't the startup process waiting for other backend to terminate?
Yeah, you're right. I agree that "waiting" should be reported in this case.
On second thought, in recovery conflict case, "waiting" should be reported
while waiting for the specified delay (e.g., by max_standby_streaming_delay)
until the conflict is resolved. So IMO reporting "waiting" in the case of
recovery conflict with buffer pin, snapshot, lock and tablespace seems valid,
because they are user-visible "expected" wait time.
However, in the case of recovery conflict with database, the recovery
basically doesn't wait at all and just terminates the conflicting sessions
immediately. Then the recovery waits for all those sessions to be terminated,
but that wait time is basically small and should not be the user-visible.
If that wait time becomes very long because of unresponsive backend, ISTM
that LOG or WARNING should be logged instead of reporting something in
PS display. I'm not sure if that logging is really necessary now, though.
Therefore, I'm thinking that "waiting" doesn't need to be reported in the case
of recovery conflict with database. Thought?
Fair enough. The longer wait time of conflicts with database is not
user-expected behaviour so logging would be better. I'd like to just
drop the change around ResolveRecoveryConflictWithDatabase() from the
patch.
Make sense.
+ if (update_process_title)
+ waitStart = GetCurrentTimestamp();
Since LockBufferForCleanup() can be called very frequently,
I don't think that it's good thing to call GetCurrentTimestamp()
every time when LockBufferForCleanup() is called.
+ /* Report via ps if we have been waiting for more than 500 msec
*/
+ if (update_process_title && new_status == NULL &&
+ TimestampDifferenceExceeds(waitStart,
GetCurrentTimestamp(),
+ 500))
Do we really want to see "waiting" in PS even in non hot standby mode?
If max_standby_streaming_delay and deadlock_timeout are set to large value,
ResolveRecoveryConflictWithBufferPin() can wait for a long time, e.g.,
more than 500ms. In that case, I'm afraid that "report if we've been
waiting for more than 500ms" logic doesn't work.
So I'm now thinking that it's better to report "waiting" immdiately before
ResolveRecoveryConflictWithBufferPin(). Of course, we can still use
"report if we've been waiting for more than 500ms" logic by teaching 500ms
to ResolveRecoveryConflictWithBufferPin() as the minimum wait time.
But this looks overkill. Thought?
Based on the above comments, I updated the patch. Attached. Right now
the patch looks very simple. Could you review this patch?
Regards,
--
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters
diff --git a/src/backend/storage/buffer/bufmgr.c
b/src/backend/storage/buffer/bufmgr.c
index e05e2b3456..e72d607a23 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -49,6 +49,7 @@
#include "storage/proc.h"
#include "storage/smgr.h"
#include "storage/standby.h"
+#include "utils/ps_status.h"
#include "utils/rel.h"
#include "utils/resowner_private.h"
#include "utils/timestamp.h"
@@ -3616,6 +3617,7 @@ void
LockBufferForCleanup(Buffer buffer)
{
BufferDesc *bufHdr;
+ char *new_status = NULL;
Assert(BufferIsValid(buffer));
Assert(PinCountWaitBuf == NULL);
@@ -3650,6 +3652,13 @@ LockBufferForCleanup(Buffer buffer)
{
/* Successfully acquired exclusive lock with pincount 1
*/
UnlockBufHdr(bufHdr, buf_state);
+
+ /* Report change to non-waiting status */
+ if (new_status)
+ {
+ set_ps_display(new_status);
+ pfree(new_status);
+ }
return;
}
/* Failed, so mark myself as waiting for pincount 1 */
@@ -3668,6 +3677,20 @@ LockBufferForCleanup(Buffer buffer)
/* Wait to be signaled by UnpinBuffer() */
if (InHotStandby)
{
+ /* Report change to waiting status */
+ if (update_process_title && new_status == NULL)
+ {
+ const char *old_status;
+ int len;
+
+ old_status = get_ps_display(&len);
+ new_status = (char *) palloc(len + 8 + 1);
+ memcpy(new_status, old_status, len);
+ strcpy(new_status + len, " waiting");
+ set_ps_display(new_status);
+ new_status[len] = '\0'; /* truncate off "
waiting" */
+ }
+
/* Publish the bufid that Startup process waits on */
SetStartupBufferPinWaitBufId(buffer - 1);
/* Set alarm and then wait to be signaled by
UnpinBuffer() */