Re: please update ps display for recovery checkpoint
On Sun, Jun 06, 2021 at 09:13:48PM -0500, Justin Pryzby wrote: > Putting this into fd.c seems to assume that we can clobber "ps", which is fine > when called by StartupXLOG(), but it's a public interface, so I'm not sure if > it's okay to assume that's the only caller. Maybe it should check if > MyAuxProcType == B_STARTUP. I would be tempted to just add that into StartupXLOG() rather than implying that callers of SyncDataDirectory() are fine to get their ps output enforced all the time. -- Michael signature.asc Description: PGP signature
Re: please update ps display for recovery checkpoint
On Mon, Jun 07, 2021 at 01:28:06PM -0400, Robert Haas wrote: > On Mon, Jun 7, 2021 at 12:02 PM Bossart, Nathan wrote: >> I've seen a few functions cause lengthy startups, including >> SyncDataDirectory() (for which I was grateful to see 61752afb), >> StartupReorderBuffer(), and RemovePgTempFiles(). I like the idea of >> adding additional information in the ps title, but I also think it is >> worth exploring additional ways to improve on these O(n) startup >> tasks. +1. I also agree with doing something for the ps output of the startup process when these are happening in crash recovery. > See also the nearby thread entitled "when the startup process doesn't" > which touches on this same issue. Here is a link to the thread: https://www.postgresql.org/message-id/ca+tgmoahqrgdfobwgy16xcomtxxsrvgfb2jncvb7-ubuee1...@mail.gmail.com -- Michael signature.asc Description: PGP signature
Re: please update ps display for recovery checkpoint
On Mon, Jun 7, 2021 at 12:02 PM Bossart, Nathan wrote: > On 6/6/21, 7:14 PM, "Justin Pryzby" wrote: > > Now, I wonder whether the startup process should also include some detail > > about > > "syncing data dir". It's possible to strace the process to see what it's > > doing, but most DBA would probably not know that, and it's helpful to know > > the > > status of recovery and what part of recovery is slow: sync, replay, or > > checkpoint. commit df9274adf improved the situation between replay and > > ckpoint, but it's still not clear what "postgres: startup" means before > > replay > > starts. > > I've seen a few functions cause lengthy startups, including > SyncDataDirectory() (for which I was grateful to see 61752afb), > StartupReorderBuffer(), and RemovePgTempFiles(). I like the idea of > adding additional information in the ps title, but I also think it is > worth exploring additional ways to improve on these O(n) startup > tasks. See also the nearby thread entitled "when the startup process doesn't" which touches on this same issue. -- Robert Haas EDB: http://www.enterprisedb.com
Re: please update ps display for recovery checkpoint
On 6/6/21, 7:14 PM, "Justin Pryzby" wrote: > Now, I wonder whether the startup process should also include some detail > about > "syncing data dir". It's possible to strace the process to see what it's > doing, but most DBA would probably not know that, and it's helpful to know the > status of recovery and what part of recovery is slow: sync, replay, or > checkpoint. commit df9274adf improved the situation between replay and > ckpoint, but it's still not clear what "postgres: startup" means before replay > starts. I've seen a few functions cause lengthy startups, including SyncDataDirectory() (for which I was grateful to see 61752afb), StartupReorderBuffer(), and RemovePgTempFiles(). I like the idea of adding additional information in the ps title, but I also think it is worth exploring additional ways to improve on these O(n) startup tasks. Nathan
Re: please update ps display for recovery checkpoint
On Mon, Dec 14, 2020 at 12:01:33PM +0900, Michael Paquier wrote: > On Sat, Dec 12, 2020 at 12:41:25AM +, Bossart, Nathan wrote: > > On 12/11/20, 4:00 PM, "Michael Paquier" wrote: > >> My counter-proposal is like the attached, with the set/reset part not > >> reversed this time, and the code indented :p > > > > Haha. LGTM. > > Thanks. I have applied this one, then. To refresh: commit df9274adf adds "checkpoint" info to "ps", which previously continued to say "recovering N" even after finishing WAL replay, and throughout the checkpoint. Now, I wonder whether the startup process should also include some detail about "syncing data dir". It's possible to strace the process to see what it's doing, but most DBA would probably not know that, and it's helpful to know the status of recovery and what part of recovery is slow: sync, replay, or checkpoint. commit df9274adf improved the situation between replay and ckpoint, but it's still not clear what "postgres: startup" means before replay starts. There's some interaction between Thomas' commit 61752afb2 and recovery_init_sync_method - if we include a startup message, it should distinguish between "syncing data dirs (fsync)" and (syncfs). Putting this into fd.c seems to assume that we can clobber "ps", which is fine when called by StartupXLOG(), but it's a public interface, so I'm not sure if it's okay to assume that's the only caller. Maybe it should check if MyAuxProcType == B_STARTUP. -- Justin >From a143c40fb2bad96d45f5cc3e22f70dd0e2d6b5c6 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sun, 6 Jun 2021 16:28:15 -0500 Subject: [PATCH] Show "syncing data directories" in ps display.. See also: df9274adf, which shows checkpoint output in ps, and 61752afb2 which adds syncfs. --- src/backend/storage/file/fd.c | 8 1 file changed, 8 insertions(+) diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index e8cd7ef088..692d7f16b7 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@ -99,6 +99,7 @@ #include "storage/fd.h" #include "storage/ipc.h" #include "utils/guc.h" +#include "utils/ps_status.h" #include "utils/resowner_private.h" /* Define PG_FLUSH_DATA_WORKS if we have an implementation for pg_flush_data */ @@ -3374,6 +3375,8 @@ SyncDataDirectory(void) * directories. */ + set_ps_display("syncing data directory (syncfs)"); + /* Sync the top level pgdata directory. */ do_syncfs("."); /* If any tablespaces are configured, sync each of those. */ @@ -3392,10 +3395,13 @@ SyncDataDirectory(void) /* If pg_wal is a symlink, process that too. */ if (xlog_is_symlink) do_syncfs("pg_wal"); + + set_ps_display(""); return; } #endif /* !HAVE_SYNCFS */ + set_ps_display("syncing data directory (fsync)"); /* * If possible, hint to the kernel that we're soon going to fsync the data * directory and its contents. Errors in this step are even less @@ -3421,6 +3427,8 @@ SyncDataDirectory(void) if (xlog_is_symlink) walkdir("pg_wal", datadir_fsync_fname, false, LOG); walkdir("pg_tblspc", datadir_fsync_fname, true, LOG); + + set_ps_display(""); } /* -- 2.17.0
Re: please update ps display for recovery checkpoint
On Sun, Dec 13, 2020 at 09:22:24PM -0600, Justin Pryzby wrote: > I'm not sure, but we could consider backpatching something to clear the > "recovering NNN" that's currently displayed during checkpoint, even though > recovery of NNN has already completed. Possibly just calling > set_ps_display(""); after "Real work is done". This behavior exists for ages and there were not a lot of complaints on this matter, so I see no reason to touch back-branches more than necessary. -- Michael signature.asc Description: PGP signature
Re: please update ps display for recovery checkpoint
On Mon, Dec 14, 2020 at 12:01:33PM +0900, Michael Paquier wrote: > On Sat, Dec 12, 2020 at 12:41:25AM +, Bossart, Nathan wrote: > > On 12/11/20, 4:00 PM, "Michael Paquier" wrote: > >> My counter-proposal is like the attached, with the set/reset part not > >> reversed this time, and the code indented :p > > > > Haha. LGTM. > > Thanks. I have applied this one, then. Thank you. I'm not sure, but we could consider backpatching something to clear the "recovering NNN" that's currently displayed during checkpoint, even though recovery of NNN has already completed. Possibly just calling set_ps_display(""); after "Real work is done". -- Justin
Re: please update ps display for recovery checkpoint
On Sat, Dec 12, 2020 at 12:41:25AM +, Bossart, Nathan wrote: > On 12/11/20, 4:00 PM, "Michael Paquier" wrote: >> My counter-proposal is like the attached, with the set/reset part not >> reversed this time, and the code indented :p > > Haha. LGTM. Thanks. I have applied this one, then. -- Michael signature.asc Description: PGP signature
Re: please update ps display for recovery checkpoint
On 12/11/20, 4:00 PM, "Michael Paquier" wrote: > On Fri, Dec 11, 2020 at 06:54:42PM +, Bossart, Nathan wrote: >> This approach seems reasonable to me. I've attached my take on it. > > + /* Reset the process title */ > + set_ps_display(""); > I would still recommend to avoid calling set_ps_display() if there is > no need to so as we avoid useless system calls, so I think that this > stuff had better use a common path for the set and reset logic. > > My counter-proposal is like the attached, with the set/reset part not > reversed this time, and the code indented :p Haha. LGTM. Nathan
Re: please update ps display for recovery checkpoint
On Fri, Dec 11, 2020 at 06:54:42PM +, Bossart, Nathan wrote: > This approach seems reasonable to me. I've attached my take on it. + /* Reset the process title */ + set_ps_display(""); I would still recommend to avoid calling set_ps_display() if there is no need to so as we avoid useless system calls, so I think that this stuff had better use a common path for the set and reset logic. My counter-proposal is like the attached, with the set/reset part not reversed this time, and the code indented :p -- Michael From cb7c41b70ff798870f2eddcaa2782bd47b5fd57f Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Sat, 12 Dec 2020 08:51:48 +0900 Subject: [PATCH v9] Add checkpoint/restartpoint status to ps display. --- src/backend/access/transam/xlog.c | 45 +++ 1 file changed, 45 insertions(+) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 7e81ce4f17..8dd225c2e1 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8687,6 +8687,39 @@ UpdateCheckPointDistanceEstimate(uint64 nbytes) (0.90 * CheckPointDistanceEstimate + 0.10 * (double) nbytes); } +/* + * Update the ps display for a process running a checkpoint. Note that + * this routine should not do any allocations so as it can be called + * from a critical section. + */ +static void +update_checkpoint_display(int flags, bool restartpoint, bool reset) +{ + /* + * The status is reported only for end-of-recovery and shutdown + * checkpoints or shutdown restartpoints. Updating the ps display is + * useful in those situations as it may not be possible to rely on + * pg_stat_activity to see the status of the checkpointer or the startup + * process. + */ + if ((flags & (CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IS_SHUTDOWN)) == 0) + return; + + if (reset) + set_ps_display(""); + else + { + char activitymsg[128]; + + snprintf(activitymsg, sizeof(activitymsg), "performing %s%s%s", + (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "", + (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "", + restartpoint ? "restartpoint" : "checkpoint"); + set_ps_display(activitymsg); + } +} + + /* * Perform a checkpoint --- either during shutdown, or on-the-fly * @@ -8905,6 +8938,9 @@ CreateCheckPoint(int flags) if (log_checkpoints) LogCheckpointStart(flags, false); + /* Update the process title */ + update_checkpoint_display(flags, false, false); + TRACE_POSTGRESQL_CHECKPOINT_START(flags); /* @@ -9120,6 +9156,9 @@ CreateCheckPoint(int flags) /* Real work is done, but log and update stats before releasing lock. */ LogCheckpointEnd(false); + /* Reset the process title */ + update_checkpoint_display(flags, false, true); + TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written, NBuffers, CheckpointStats.ckpt_segs_added, @@ -9374,6 +9413,9 @@ CreateRestartPoint(int flags) if (log_checkpoints) LogCheckpointStart(flags, true); + /* Update the process title */ + update_checkpoint_display(flags, true, false); + CheckPointGuts(lastCheckPoint.redo, flags); /* @@ -9492,6 +9534,9 @@ CreateRestartPoint(int flags) /* Real work is done, but log and update before releasing lock. */ LogCheckpointEnd(true); + /* Reset the process title */ + update_checkpoint_display(flags, true, true); + xtime = GetLatestXTime(); ereport((log_checkpoints ? LOG : DEBUG2), (errmsg("recovery restart point at %X/%X", -- 2.29.2 signature.asc Description: PGP signature
Re: please update ps display for recovery checkpoint
On Thu, Dec 10, 2020 at 10:02:10PM -0600, Justin Pryzby wrote: > Isn't the sense of "reset" inverted ? It is ;p -- Michael signature.asc Description: PGP signature
Re: please update ps display for recovery checkpoint
Isn't the sense of "reset" inverted ? I think maybe you mean to do set_ps_display(""); in the "if reset". On Fri, Dec 11, 2020 at 12:54:22PM +0900, Michael Paquier wrote: > +update_checkpoint_display(int flags, bool restartpoint, bool reset) > +{ > + if (reset) > + { > + char activitymsg[128]; > + > + snprintf(activitymsg, sizeof(activitymsg), "performing %s%s%s", > + (flags & CHECKPOINT_END_OF_RECOVERY) ? > "end-of-recovery " : "", > + (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " > : "", > + restartpoint ? "restartpoint" : "checkpoint"); > + set_ps_display(activitymsg); > + } > + else > + set_ps_display(""); > +} > + > + > /* > * Perform a checkpoint --- either during shutdown, or on-the-fly > * > @@ -8905,6 +8937,9 @@ CreateCheckPoint(int flags) > if (log_checkpoints) > LogCheckpointStart(flags, false); > > + /* Update the process title */ > + update_checkpoint_display(flags, false, false); > + > TRACE_POSTGRESQL_CHECKPOINT_START(flags); > > /* > @@ -9120,6 +9155,9 @@ CreateCheckPoint(int flags) > /* Real work is done, but log and update stats before releasing lock. */ > LogCheckpointEnd(false); > > + /* Reset the process title */ > + update_checkpoint_display(flags, false, true); > + > TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written, > > NBuffers, > > CheckpointStats.ckpt_segs_added, > @@ -9374,6 +9412,9 @@ CreateRestartPoint(int flags) > if (log_checkpoints) > LogCheckpointStart(flags, true); > > + /* Update the process title */ > + update_checkpoint_display(flags, true, false); > + > CheckPointGuts(lastCheckPoint.redo, flags); > > /* > @@ -9492,6 +9533,9 @@ CreateRestartPoint(int flags) > /* Real work is done, but log and update before releasing lock. */ > LogCheckpointEnd(true); > > + /* Reset the process title */ > + update_checkpoint_display(flags, true, true); > + > xtime = GetLatestXTime(); > ereport((log_checkpoints ? LOG : DEBUG2), > (errmsg("recovery restart point at %X/%X",
Re: please update ps display for recovery checkpoint
On Wed, Dec 09, 2020 at 06:37:22PM +, Bossart, Nathan wrote: > On 12/8/20, 10:16 PM, "Michael Paquier" wrote: >> Yeah, I'd rather leave out the part of the patch where we have the >> checkpointer say "idle". So I still think that what v3 did upthread, >> by just resetting the ps display in all cases, feels more natural. So >> we should remove the parts of v5 from checkpointer.c. > > That seems fine to me. I think it is most important that the end-of- > recovery and shutdown checkpoints are shown. I'm not sure there's > much value in updating the process title for automatic checkpoints and > checkpoints triggered via the CHECKPOINT command, so IMO we could skip > those, too. I made these changes in the new version of the patch. It would be possible to use pg_stat_activity in most cases here, so I agree to settle down to the minimum we can agree on for now, and maybe discuss separately if this should be extended in some or another in the future if there is a use-case for that. So I'd say that what you have here is logically fine. > > + snprintf(activitymsg, sizeof(activitymsg), "creating %s%scheckpoint", > > [...] > > + snprintf(activitymsg, sizeof(activitymsg), "creating %srestartpoint", > > FWIW, I still fing "running" to sound better than "creating" here. An > > extra term I can think of that could be adapted is "performing". > > I think I prefer "performing" over "running" because that's what the > docs use [0]. I've changed it to "performing" in the new version of > the patch. That's also used in the code comments, FWIW. @@ -9282,6 +9296,7 @@ CreateRestartPoint(int flags) XLogRecPtr endptr; XLogSegNo _logSegNo; TimestampTz xtime; + boolshutdown = (flags & CHECKPOINT_IS_SHUTDOWN); You are right that CHECKPOINT_END_OF_RECOVERY should not be called for a restart point, so that's correct. However, I think that it would be better to have all those four code paths controlled by a single routine, where we pass down the checkpoint flags and fill in the ps string directly depending on what the caller wants to do. This way, we will avoid any inconsistent updates if this stuff gets extended in the future, and there will be all the information at hand to extend the logic. So I have simplified your patch as per the attached. Thoughts? -- Michael From 98440b74b10c522bd485e072c43631ce78d115a7 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Fri, 11 Dec 2020 12:53:36 +0900 Subject: [PATCH v7] Add checkpoint/restartpoint status to ps display. --- src/backend/access/transam/xlog.c | 44 +++ 1 file changed, 44 insertions(+) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 7e81ce4f17..8d0a5e05fc 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8687,6 +8687,38 @@ UpdateCheckPointDistanceEstimate(uint64 nbytes) (0.90 * CheckPointDistanceEstimate + 0.10 * (double) nbytes); } +/* + * Update the ps display for a process running a checkpoint. Note that + * this routine should not do any allocations so as it can be called + * from a critical section. + */ +static void +update_checkpoint_display(int flags, bool restartpoint, bool reset) +{ + /* + * The status is reported only for end-of-recovery and shutdown checkpoints + * or shutdown restartpoints. Updating the ps display is useful in those + * situations as it may not be possible to rely on pg_stat_activity to see + * the status of the checkpointer or the startup process. + */ + if ((flags & (CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IS_SHUTDOWN)) == 0) + return; + + if (reset) + { + char activitymsg[128]; + + snprintf(activitymsg, sizeof(activitymsg), "performing %s%s%s", + (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "", + (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "", + restartpoint ? "restartpoint" : "checkpoint"); + set_ps_display(activitymsg); + } + else + set_ps_display(""); +} + + /* * Perform a checkpoint --- either during shutdown, or on-the-fly * @@ -8905,6 +8937,9 @@ CreateCheckPoint(int flags) if (log_checkpoints) LogCheckpointStart(flags, false); + /* Update the process title */ + update_checkpoint_display(flags, false, false); + TRACE_POSTGRESQL_CHECKPOINT_START(flags); /* @@ -9120,6 +9155,9 @@ CreateCheckPoint(int flags) /* Real work is done, but log and update stats before releasing lock. */ LogCheckpointEnd(false); + /* Reset the process title */ + update_checkpoint_display(flags, false, true); + TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written, NBuffers, CheckpointStats.ckpt_segs_added, @@ -9374,6 +9412,9 @@ CreateRestartPoint(int flags) if (log_checkpoints) LogCheckpointStart(flags, true); + /* Update the process title */ + update_checkpoint_display(flags, true, false); + CheckPointGuts(lastCheckPoint.redo, flags); /* @@ -9492,6 +9533,9 @@ CreateRestartPoint(int f
Re: please update ps display for recovery checkpoint
On Wed, Dec 09, 2020 at 02:00:44AM +0900, Fujii Masao wrote: > I agree it might be helpful to display something like > "checkpointing" or "waiting for checkpoint" in PS title for the > startup process. Thanks. > But, at least for me, it seems strange to display "idle" only for > checkpointer. Yeah, I'd rather leave out the part of the patch where we have the checkpointer say "idle". So I still think that what v3 did upthread, by just resetting the ps display in all cases, feels more natural. So we should remove the parts of v5 from checkpointer.c. +* Reset the ps status display. We set the status to "idle" for the +* checkpointer process, and we clear it for anything else that runs this +* code. +*/ + if (MyBackendType == B_CHECKPOINTER) + set_ps_display("idle"); + else + set_ps_display(""); Regarding this part, this just needs a reset with an empty string. The second sentence is confusing (it partially comes fronm v3, from me..). Do we lose anything by removing the second sentence of this comment? + snprintf(activitymsg, sizeof(activitymsg), "creating %s%scheckpoint", [...] + snprintf(activitymsg, sizeof(activitymsg), "creating %srestartpoint", FWIW, I still fing "running" to sound better than "creating" here. An extra term I can think of that could be adapted is "performing". > *If* we want to monitor the current status of > checkpointer, it should be shown as wait event in pg_stat_activity, > instead? That would be WAIT_EVENT_CHECKPOINTER_MAIN, now there has been also on this thread an argument that you would not have access to pg_stat_activity until crash recovery completes and consistency is restored. -- Michael signature.asc Description: PGP signature
Re: please update ps display for recovery checkpoint
On 2020/12/05 2:17, Bossart, Nathan wrote: On 12/3/20, 1:19 PM, "Bossart, Nathan" wrote: I like the idea behind this patch. I wrote a new version with a couple of changes: I missed an #include in this patch. Here's a new one with that fixed. I agree it might be helpful to display something like "checkpointing" or "waiting for checkpoint" in PS title for the startup process. But, at least for me, it seems strange to display "idle" only for checkpointer. *If* we want to monitor the current status of checkpointer, it should be shown as wait event in pg_stat_activity, instead? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: please update ps display for recovery checkpoint
On 12/3/20, 1:58 PM, "Justin Pryzby" wrote: > On Thu, Dec 03, 2020 at 09:18:07PM +, Bossart, Nathan wrote: >> I considered also checking that update_process_title was enabled, but >> I figured that these ps display updates should happen sparsely enough >> that it wouldn't make much of an impact. > > Since bf68b79e5, update_ps_display is responsible for checking > update_process_title. Its other, remaining uses are apparently just acting as > minor optimizations to guard against useless snprintf's. > > See also > https://www.postgresql.org/message-id/flat/1288021.1600178478%40sss.pgh.pa.us > in which (I just saw) Tom wrote: > >> Seems like a good argument, but you'd have to be careful about the >> final state when you stop overriding update_process_title --- it can't >> be left looking like it's still-in-progress on some random WAL file. > > I think that's a live problem, not just a concern for that patch. > It was exactly my complaint leading to this thread: > >> But runs a checkpoint, which can take a long time, while the "ps" display >> still >> says "recovering ". Ah, I see. Thanks for pointing this out. Nathan
Re: please update ps display for recovery checkpoint
On Thu, Dec 03, 2020 at 09:18:07PM +, Bossart, Nathan wrote: > I considered also checking that update_process_title was enabled, but > I figured that these ps display updates should happen sparsely enough > that it wouldn't make much of an impact. Since bf68b79e5, update_ps_display is responsible for checking update_process_title. Its other, remaining uses are apparently just acting as minor optimizations to guard against useless snprintf's. See also https://www.postgresql.org/message-id/flat/1288021.1600178478%40sss.pgh.pa.us in which (I just saw) Tom wrote: > Seems like a good argument, but you'd have to be careful about the > final state when you stop overriding update_process_title --- it can't > be left looking like it's still-in-progress on some random WAL file. I think that's a live problem, not just a concern for that patch. It was exactly my complaint leading to this thread: > But runs a checkpoint, which can take a long time, while the "ps" display > still > says "recovering ". -- Justin
Re: please update ps display for recovery checkpoint
On Fri, Oct 02, 2020 at 04:28:14PM +0900, Michael Paquier wrote: > > Related: I have always thought that this message meant "recovery will > > complete > > Real Soon", but I now understand it to mean "beginning the recovery > > checkpoint, > > which is flagged CHECKPOINT_IMMEDIATE" (and may take a long time). > > Yep. And at the end of crash recovery seconds feel like minutes. > > I agree that "checkpointer checkpoint" is not the best fit. Using > parenthesis would also be inconsistent with the other usages of this > API in the backend code. I think maybe I got the idea for parenthesis from these: src/backend/tcop/postgres.c:set_ps_display("idle in transaction (aborted)"); src/backend/postmaster/postmaster.c-if (port->remote_port[0] != '\0') src/backend/postmaster/postmaster.c-appendStringInfo(&ps_data, "(%s)", port->remote_port); > What about adding "running" then? This > would give "checkpointer running end-of-recovery checkpoint". What about one of these? "checkpointer: running end-of-recovery checkpoint" "checkpointer running: end-of-recovery checkpoint" Thanks for looking. -- Justin
Re: please update ps display for recovery checkpoint
On Sat, Sep 19, 2020 at 11:00:31AM -0500, Justin Pryzby wrote: > Maybe it's a bad idea if the checkpointer is continuously changing its > display. > I don't see the utility in it, since log_checkpoints does more than ps could > ever do. I'm concerned that would break things for someone using something > like pgrep. At the end of recovery, there is a code path where the startup process triggers the checkpoint by itself if the bgwriter is not launched, but there is also a second code path where, if the bgwriter is started and if the cluster not promoted, the startup process would request for an immediate checkpoint and then wait for it. It is IMO equally important to update the display of the checkpointer in this case to show that the checkpointer is running an end-of-recovery checkpoint. > Related: I have always thought that this message meant "recovery will complete > Real Soon", but I now understand it to mean "beginning the recovery > checkpoint, > which is flagged CHECKPOINT_IMMEDIATE" (and may take a long time). Yep. And at the end of crash recovery seconds feel like minutes. I agree that "checkpointer checkpoint" is not the best fit. Using parenthesis would also be inconsistent with the other usages of this API in the backend code. What about adding "running" then? This would give "checkpointer running end-of-recovery checkpoint". While looking at this patch, I got tempted to use a StringInfo to fill in the string to display as that would make the addition of any extra information easier, giving the attached. -- Michael diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 8f11b1b9de..e25c29e66c 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8714,6 +8714,7 @@ CreateCheckPoint(int flags) XLogRecPtr last_important_lsn; VirtualTransactionId *vxids; int nvxids; + StringInfoData ps_status; /* * An end-of-recovery checkpoint is really a shutdown checkpoint, just @@ -8756,6 +8757,21 @@ CreateCheckPoint(int flags) MemSet(&CheckpointStats, 0, sizeof(CheckpointStats)); CheckpointStats.ckpt_start_t = GetCurrentTimestamp(); + + /* + * Set string to use for the ps status display. This cannot be + * built within a critical section, and the display is set when + * the checkpoint is really starting. + */ + initStringInfo(&ps_status); + appendStringInfoString(&ps_status, "running "); + + if ((flags & CHECKPOINT_END_OF_RECOVERY) != 0) + appendStringInfoString(&ps_status, "end-of-recovery "); + if ((flags & CHECKPOINT_IS_SHUTDOWN) != 0) + appendStringInfoString(&ps_status, "shutdown "); + appendStringInfoString(&ps_status, "checkpoint"); + /* * Use a critical section to force system panic if we have trouble. */ @@ -8889,6 +8905,13 @@ CreateCheckPoint(int flags) if (log_checkpoints) LogCheckpointStart(flags, false); + /* + * Set ps display status, step postponed until now as the checkpoint + * could have been skipped. + */ + set_ps_display(ps_status.data); + pfree(ps_status.data); + TRACE_POSTGRESQL_CHECKPOINT_START(flags); /* @@ -9104,6 +9127,13 @@ CreateCheckPoint(int flags) /* Real work is done, but log and update stats before releasing lock. */ LogCheckpointEnd(false); + /* + * Reset the ps status display. "idle" or similar is not used here + * as this can be called in the startup process, which could be + * confusing. + */ + set_ps_display(""); + TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written, NBuffers, CheckpointStats.ckpt_segs_added, @@ -9266,6 +9296,7 @@ CreateRestartPoint(int flags) XLogRecPtr endptr; XLogSegNo _logSegNo; TimestampTz xtime; + StringInfoData ps_status; /* * Acquire CheckpointLock to ensure only one restartpoint or checkpoint @@ -9358,6 +9389,16 @@ CreateRestartPoint(int flags) if (log_checkpoints) LogCheckpointStart(flags, true); + /* Set ps status display for the process running the checkpoint */ + initStringInfo(&ps_status); + appendStringInfoString(&ps_status, "running "); + if ((flags & CHECKPOINT_IS_SHUTDOWN) != 0) + appendStringInfoString(&ps_status, "shutdown "); + appendStringInfoString(&ps_status, "restartpoint"); + + set_ps_display(ps_status.data); + pfree(ps_status.data); + CheckPointGuts(lastCheckPoint.redo, flags); /* @@ -9476,6 +9517,13 @@ CreateRestartPoint(int flags) /* Real work is done, but log and update before releasing lock. */ LogCheckpointEnd(true); + /* + * Reset the ps status display. "idle" or similar is not used here + * as this can be called in the startup process, which could be + * confusing. + */ + set_ps_display(""); + xtime = GetLatestXTime(); ereport((log_checkpoints ? LOG : DEBUG2), (errmsg("recovery restart point at %X/%X", signature.asc Description: PGP signature
Re: please update ps display for recovery checkpoint
On Thu, Sep 10, 2020 at 01:37:10PM +0900, Michael Paquier wrote: > On Wed, Sep 09, 2020 at 09:00:50PM -0500, Justin Pryzby wrote: > > What would you want the checkpointer's ps to say ? > > > > Normally it just says: > > postgres 3468 3151 0 Aug27 ?00:20:57 postgres: checkpointer > > > > Note that CreateCheckPoint() can also be called from the startup > process if the bgwriter has not been launched once recovery finishes. > > > Or do you mean do the same thing as now, but one layer lower, like: > > > > @@ -8728,6 +8725,9 @@ CreateCheckPoint(int flags) > > + if (flags & CHECKPOINT_END_OF_RECOVERY) > > + set_ps_display("recovery checkpoint"); > > For the use-case discussed here, that would be fine. Now the > difficult point is how much information we can actually display > without bloating ps while still have something meaningful. Showing > all the information from LogCheckpointStart() would bloat the output a > lot for example. So, thinking about that, my take would be to have ps > display the following at the beginning of CreateCheckpoint() and > CreateRestartPoint(): > - restartpoint or checkpoint > - shutdown > - end-of-recovery > > The output also needs to be cleared once the routines finish or if > there is a skip, of course. In my inital request, I *only* care about the startup process' recovery checkpoint. AFAIK, this exits when it's done, so there may be no need to "revert" to the previous "ps". However, one could argue that it's currently a bug that the "recovering NNN" portion isn't updated after finishing the WAL files. StartupXLOG -> xlogreader -> XLogPageRead -> WaitForWALToBecomeAvailable -> XLogFileReadAnyTLI -> XLogFileRead -> CreateCheckPoint Maybe it's a bad idea if the checkpointer is continuously changing its display. I don't see the utility in it, since log_checkpoints does more than ps could ever do. I'm concerned that would break things for someone using something like pgrep. |$ ps -wwf `pgrep -f 'checkpointer *$'` |UIDPID PPID C STIME TTY STAT TIME CMD |postgres 9434 9418 0 Aug20 ?Ss 214:25 postgres: checkpointer |pryzbyj 23010 23007 0 10:33 ?00:00:00 postgres: checkpointer checkpoint I think this one is by far the most common, but somewhat confusing, since it's only one word. This led me to put parenthesis around it: |pryzbyj 26810 26809 82 10:53 ?00:00:12 postgres: startup (end-of-recovery checkpoint) Related: I have always thought that this message meant "recovery will complete Real Soon", but I now understand it to mean "beginning the recovery checkpoint, which is flagged CHECKPOINT_IMMEDIATE" (and may take a long time). |2020-09-19 10:53:26.345 CDT [26810] LOG: checkpoint starting: end-of-recovery immediate -- Justin >From d1d473beb4b59a93ceb7859f4776da47d9381aee Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 19 Sep 2020 10:12:53 -0500 Subject: [PATCH v2 1/4] Clear PS display after recovery ...so it doesn't continue to say 'recovering ...' during checkpoint --- src/backend/access/transam/xlog.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 61754312e2..0183cae9a9 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7396,6 +7396,7 @@ StartupXLOG(void) /* * end of main redo apply loop */ + set_ps_display(""); if (reachedRecoveryTarget) { -- 2.17.0 >From 0e0bbeb6501f5b1a7a5f0e4109dbf8a8152de249 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 8 Feb 2020 09:16:14 -0600 Subject: [PATCH v2 2/4] Update PS display following replay of last xlog.. ..otherwise it shows "recovering " for the duration of the recovery checkpoint. --- src/backend/access/transam/xlog.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 0183cae9a9..3d8220ba78 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8892,6 +8892,13 @@ CreateCheckPoint(int flags) if (log_checkpoints) LogCheckpointStart(flags, false); + if (flags & CHECKPOINT_END_OF_RECOVERY) + set_ps_display("(end-of-recovery checkpoint)"); + else if (flags & CHECKPOINT_IS_SHUTDOWN) + set_ps_display("(shutdown checkpoint)"); + else + set_ps_display("(checkpoint)"); + TRACE_POSTGRESQL_CHECKPOINT_START(flags); /* @@ -9106,6 +9113,7 @@ CreateCheckPoint(int flags) /* Real work is done, but log and update stats before releasing lock. */ LogCheckpointEnd(false); + set_ps_display(""); TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written, NBuffers, @@ -9349,6 +9357,11 @@ CreateRestartPoint(int flags) if (log_checkpoints) LogCheckpointStart(flags, true); + if (flags & CHECKPOINT_IS_SHUTDOWN) + set_ps_display("(shutdown rest
Re: please update ps display for recovery checkpoint
On Wed, Sep 09, 2020 at 09:00:50PM -0500, Justin Pryzby wrote: > What would you want the checkpointer's ps to say ? > > Normally it just says: > postgres 3468 3151 0 Aug27 ?00:20:57 postgres: checkpointer > Note that CreateCheckPoint() can also be called from the startup process if the bgwriter has not been launched once recovery finishes. > Or do you mean do the same thing as now, but one layer lower, like: > > @@ -8728,6 +8725,9 @@ CreateCheckPoint(int flags) > + if (flags & CHECKPOINT_END_OF_RECOVERY) > + set_ps_display("recovery checkpoint"); For the use-case discussed here, that would be fine. Now the difficult point is how much information we can actually display without bloating ps while still have something meaningful. Showing all the information from LogCheckpointStart() would bloat the output a lot for example. So, thinking about that, my take would be to have ps display the following at the beginning of CreateCheckpoint() and CreateRestartPoint(): - restartpoint or checkpoint - shutdown - end-of-recovery The output also needs to be cleared once the routines finish or if there is a skip, of course. -- Michael signature.asc Description: PGP signature
Re: please update ps display for recovery checkpoint
On Mon, Aug 31, 2020 at 03:52:44PM +0900, Michael Paquier wrote: > On Thu, Aug 20, 2020 at 05:09:05PM +0900, Michael Paquier wrote: > > That could be helpful. Wouldn't it be better to use "end-of-recovery > > checkpoint" instead? That's the common wording in the code comments. > > > > I don't see the point of patch 0002. In the same paragraph, we > > already know that this applies to any checkpoints. > > Thinking more about this.. Could it be better to just add some calls > to set_ps_display() directly in CreateCheckPoint()? This way, both > the checkpointer as well as the startup process at the end of recovery > would benefit from the change. What would you want the checkpointer's ps to say ? Normally it just says: postgres 3468 3151 0 Aug27 ?00:20:57 postgres: checkpointer Or do you mean do the same thing as now, but one layer lower, like: @@ -8728,6 +8725,9 @@ CreateCheckPoint(int flags) + if (flags & CHECKPOINT_END_OF_RECOVERY) + set_ps_display("recovery checkpoint"); -- Justin
Re: please update ps display for recovery checkpoint
On Thu, Aug 20, 2020 at 05:09:05PM +0900, Michael Paquier wrote: > That could be helpful. Wouldn't it be better to use "end-of-recovery > checkpoint" instead? That's the common wording in the code comments. > > I don't see the point of patch 0002. In the same paragraph, we > already know that this applies to any checkpoints. Thinking more about this.. Could it be better to just add some calls to set_ps_display() directly in CreateCheckPoint()? This way, both the checkpointer as well as the startup process at the end of recovery would benefit from the change. -- Michael signature.asc Description: PGP signature
Re: please update ps display for recovery checkpoint
On Wed, Aug 19, 2020 at 12:20:50AM +, k.jami...@fujitsu.com wrote: > On Wednesday, August 19, 2020 7:53 AM (GMT+9), Justin Pryzby wrote: >> During crash recovery, the server writes this to log: >> Please change to say "recovery checkpoint" or similar, as I mentioned here. >> https://www.postgresql.org/message-id/2020011820.GP26045@telsasoft.c >> om > > Yes, I agree that it is helpful to tell users about that. That could be helpful. Wouldn't it be better to use "end-of-recovery checkpoint" instead? That's the common wording in the code comments. I don't see the point of patch 0002. In the same paragraph, we already know that this applies to any checkpoints. > About 0003 patch, there are similar phrases in bgwriter_flush_after and > backend_flush_after. Should those be updated too? Yep, makes sense. -- Michael signature.asc Description: PGP signature
RE: please update ps display for recovery checkpoint
On Wednesday, August 19, 2020 7:53 AM (GMT+9), Justin Pryzby wrote: Hi, All the patches apply, although when applying them the following appears: (Stripping trailing CRs from patch; use --binary to disable.) > During crash recovery, the server writes this to log: > > < 2020-08-16 08:46:08.601 -03 >LOG: redo done at 2299C/1EC6BA00 < > 2020-08-16 08:46:08.877 -03 >LOG: checkpoint starting: end-of-recovery > immediate > > But runs a checkpoint, which can take a long time, while the "ps" display > still says > "recovering ". > > Please change to say "recovery checkpoint" or similar, as I mentioned here. > https://www.postgresql.org/message-id/2020011820.GP26045@telsasoft.c > om Yes, I agree that it is helpful to tell users about that. About 0003 patch, there are similar phrases in bgwriter_flush_after and backend_flush_after. Should those be updated too? --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -3170,7 +3170,7 @@ include_dir 'conf.d' limit the amount of dirty data in the kernel's page cache, reducing the likelihood of stalls when an fsync is issued at the end of the checkpoint, or when the OS writes data back in larger batches in the -background. Often that will result in greatly reduced transaction +background. This feature will often result in greatly reduced transaction latency, but there also are some cases, especially with workloads that are bigger than , but smaller than the OS's page cache, where performance might degrade. This Regards, Kirk Jamison