Hi, Thanks for looking into this.
On Fri, Mar 6, 2026 at 3:32 PM Chao Li <[email protected]> wrote: > > > > > On Mar 5, 2026, at 21:01, Xuneng Zhou <[email protected]> wrote: > > > > Hi, > > > > On Thu, Mar 5, 2026 at 3:37 PM Xuneng Zhou <[email protected]> wrote: > >> > >> Hi Michael, > >> > >> Thanks for the detailed feedback! > >> > >> On Thu, Mar 5, 2026 at 11:58 AM Michael Paquier <[email protected]> > >> wrote: > >>> > >>> On Wed, Mar 04, 2026 at 09:02:29AM +0800, Xuneng Zhou wrote: > >>>> Just rebase. > >>> > >>> I have applied 0001, that simply moves some code around. > >>> > >> > >> Thanks for applying. > >> > >>> Regarding 0002, I would recommend to not bump the catalog version in > >>> catversion.h when sending a patch to the lists. This task is left to > >>> committers when the code gets merged into the tree. And this is > >>> annoying for submitters because it can create a lot of conflicts. > >> > >> I'll leave it untouched. > >> > >>> Using a set-returning function is I think wrong here, betraying the > >>> representation of the recovery status as stored in the system. We > >>> know that there is only one state of recovery, fixed in shared memory. > >>> Like the cousins of this new function, let's make thinks non-SRF, > >>> returning one row all the time with PG_RETURN_NULL() if the conditions > >>> for information display are not satisfied. When we are not in > >>> recovery or when the role querying the function is not granted > >>> ROLE_PG_READ_ALL_STATS, that will simplify the patch as there is no > >>> need for the else branch with the nulls, as written in your patch. > >>> The field values are acquired the right way, spinlock acquisitions > >>> have to be short. > >> > >> My earlier thought for keeping pg_stat_get_recovery as an SRF is to > >> make pg_stat_recovery simpler by avoiding an extra filter such as > >> WHERE s.promote_triggered IS NOT NULL to preserve 0/1-row semantics. > >> pg_stat_get_recovery_prefetch also uses the SRF pattern. > >> > >>> Like pg_stat_wal_receiver, let's add to the view definition a qual to > >>> return a row only if the fields are not NULL. > >>> > >>> pg_get_wal_replay_pause_state() displays the pause_state, and it is > >>> not hidden behind the stats read role. I am not really convinced that > >>> this is worth bothering to treat as an exception in this patch. It > >>> makes it a bit more complicated, for little gain. I would just hide > >>> all the fields behind the role granted, to keep the code simpler, even > >>> if that's slightly inconsistent with pg_get_wal_replay_pause_state(). > >> > >> I agree that exposing a subset of columns unconditionally is not worth > >> the added complexity. > >> > >>> After writing this last point, as far as I can see there is a small > >>> optimization possible in the patch. When a role is not granted > >>> ROLE_PG_READ_ALL_STATS, we know that we will not return any > >>> information so we could skip the spinlock acquisition and avoid > >>> spinlock acquisitions when one queries the function but has no access > >>> to its data. > >> > >> Makes sense to me. I'll apply it as suggested. > >> > >>> + True if a promotion has been triggered for this standby server. > >>> > >>> Standby servers are implied if data is returned, this sentence can be > >>> simpler and cut the "standby server" part. > >> > >> + 1 > >> > >>> + Start LSN of the last WAL record replayed during recovery. > >>> [...] > >>> + End LSN of the last WAL record replayed during recovery. > >>> [...] > >>> + Timeline of the last replayed WAL record. > >>> For other system views with LSN information, we don't use "LSN", but > >>> "write-ahead log location". I'd suggest the same term here. These > >>> three fields refer to the last record *successfully* replayed. It > >>> seems important to mention this fact, I guess? > >> > >> I'll replace them. > >> > >>> + <structfield>replay_end_lsn</structfield> <type>pg_lsn</type> > >>> + </para> > >>> + <para> > >>> + Current replay position. When replaying a record, this is the end > >>> + position of the record being replayed; otherwise it equals > >>> + <structfield>last_replayed_end_lsn</structfield>. > >>> > >>> Slightly inexact. This is the end LSN + 1. > >> > >> Yeh, this needs to be corrected. > >> > >>> + <structfield>replay_end_lsn</structfield> <type>pg_lsn</type> > >>> [..] > >>> + <structfield>replay_end_tli</structfield> <type>integer</type> > >>> > >>> Okay, I can fall behind the addition of these two, it could be helpful > >>> in debugging something like a locking issue when replaying a specific > >>> record, I guess. At least I'd want to know what is happening for a > >>> record currently being replayed. It seems to me that this could be > >>> more precise, mentioning that these refer to a record *currently* > >>> being replayed. > >> > >> I will adjust the docs to say these describe the record currently > >> being replayed, with replay_end_lsn being the end position + 1. > >> > >>> Is recoveryLastXTime actually that relevant to have? We use it for > >>> logging and for recovery target times, which is something less > >>> relevant than the other fields, especially if we think about standbys > >>> where these have no targets to reach. > >> > >> I agree it is less central for standby monitoring (and partly overlaps > >> with pg_last_xact_replay_timestamp()), so I’ll remove it from this > >> view in the next revision. > >> > >>> currentChunkStartTime, on the contrary, is much more relevant to me, > >>> due to the fact that we use it in WaitForWALToBecomeAvailable() with > >>> active WAL receivers running. > >> > >> Yeah, it could be useful for apply-delay/catch-up diagnostics. > >> > > > > Here is the updated patch set. Please take a look. > > > > > > -- > > Best, > > Xuneng > > <v3-0001-Add-pg_stat_recovery-system-view.patch><v3-0002-Refactor-move-XLogSource-enum-to-xlogrecovery.h.patch><v3-0003-Add-wal_source-column-to-pg_stat_recovery.patch> > > 0001 has been pushed. > > 0002 is pure refactoring and only moves the structure XLogSource from .h to > .c, thus no comment. > > A few comments on 0003. > > 1 > ``` > + /* > + * Source from which the startup process most recently read WAL data. > + * Updated when the startup process successfully reads WAL from a > source. > + * Note: this reflects the read source, not the original receipt > source; > + * streamed WAL read from local files will show XLOG_FROM_PG_WAL. > + */ > + XLogSource lastReadSource; > ``` > > This comment says, "streamed WAL read from local files will show > XLOG_FROM_PG_WAL”. But I don’t see how the conversion happens. In > WaitForWALToBecomeAvailable(), there is a path XLogFileRead() is called with > XLOG_FROM_STREAM, but in XLogFileRead(), source is directly assigned to > XLogRecoveryCtl->lastReadSource without conversion. > > On the other side, if “stream” is impossible, then the doc should not mention > it: > ``` > + <para> > + <literal>stream</literal>: WAL actively being streamed from the > + upstream server. > + </para> > ``` you're right, the comment is wrong. There is no conversion. In WaitForWALToBecomeAvailable(), when the walreceiver has flushed data ahead of the startup process, the code deliberately opens the segment file with XLOG_FROM_STREAM , and the existing comment explains the intent: * Use XLOG_FROM_STREAM so that source info is set correctly * and XLogReceiptTime isn't changed. XLogFileRead() then assigns that directly to lastReadSource without conversion. So XLOG_FROM_STREAM is a valid observable value — it means the segment was opened during active walreceiver streaming, even though the actual I/O is against a local file in pg_wal. I think stream documentation entry should stay. > 2 Related to 1. WRT the new column name wal_source, it sounds like “where WAL > is coming from”. Say the comment of lastReadSource is true, WAL from stream > is shown as pg_wal, “wal source” sounds more like stream, and “wal read > source” is pg_wal. From this perspective, I just feel the new column name > should be something like “wal_read_source”. > agreed, wal_read_source seems to be a better name. It more closely mirrors the underlying lastReadSource field and avoids ambiguity about whether the column describes the delivery mechanism or the read path. Please check the updated patches. -- Best, Xuneng
From 86b6e0bac90ba6c9464525c80e74879cc06f7c4d Mon Sep 17 00:00:00 2001 From: alterego655 <[email protected]> Date: Tue, 27 Jan 2026 13:59:29 +0800 Subject: [PATCH v4 1/2] Refactor: move XLogSource enum to xlogrecovery.h Move the XLogSource enum definition from xlogrecovery.c to the public header xlogrecovery.h to make it available for external use. This is preparation for exposing WAL source information via the pg_stat_recovery view. The xlogSourceNames array remains in the implementation file as it's only used for debugging output. No functional change. --- src/backend/access/transam/xlogrecovery.c | 12 ------------ src/include/access/xlogrecovery.h | 12 ++++++++++++ 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index fbddd7e522c..11915ab3de0 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -204,18 +204,6 @@ typedef struct XLogPageReadPrivate /* flag to tell XLogPageRead that we have started replaying */ static bool InRedo = false; -/* - * Codes indicating where we got a WAL file from during recovery, or where - * to attempt to get one. - */ -typedef enum -{ - XLOG_FROM_ANY = 0, /* request to read WAL from any source */ - XLOG_FROM_ARCHIVE, /* restored using restore_command */ - XLOG_FROM_PG_WAL, /* existing file in pg_wal */ - XLOG_FROM_STREAM, /* streamed from primary */ -} XLogSource; - /* human-readable names for XLogSources, for debugging output */ static const char *const xlogSourceNames[] = {"any", "archive", "pg_wal", "stream"}; diff --git a/src/include/access/xlogrecovery.h b/src/include/access/xlogrecovery.h index 2842106b285..ba714a78280 100644 --- a/src/include/access/xlogrecovery.h +++ b/src/include/access/xlogrecovery.h @@ -60,6 +60,18 @@ typedef enum RecoveryPauseState RECOVERY_PAUSED, /* recovery is paused */ } RecoveryPauseState; +/* + * Codes indicating where we got a WAL file from during recovery, or where + * to attempt to get one. + */ +typedef enum XLogSource +{ + XLOG_FROM_ANY = 0, /* request to read WAL from any source */ + XLOG_FROM_ARCHIVE, /* restored using restore_command */ + XLOG_FROM_PG_WAL, /* existing file in pg_wal */ + XLOG_FROM_STREAM, /* streamed from primary */ +} XLogSource; + /* * Shared-memory state for WAL recovery. */ -- 2.51.0
From 2760c4e2b37a53a6d78410fd0661021d3ff1a871 Mon Sep 17 00:00:00 2001 From: alterego655 <[email protected]> Date: Tue, 27 Jan 2026 14:02:52 +0800 Subject: [PATCH v4 2/2] Add wal_source column to pg_stat_recovery Extend pg_stat_recovery with a wal_source column that shows where the startup process most recently read WAL data from: 'archive', 'pg_wal', or 'stream'. This helps diagnose recovery behavior: - Detecting streaming vs archive fallback transitions - Monitoring initial standby catch-up progress - Troubleshooting replication lag sources The column reflects the current read source, not the original delivery mechanism. Streamed WAL that is subsequently read from local files shows 'pg_wal'. NULL if no WAL has been read yet. --- doc/src/sgml/monitoring.sgml | 36 +++++++++++++++++++++++ src/backend/access/transam/xlogfuncs.c | 18 ++++++++++++ src/backend/access/transam/xlogrecovery.c | 6 ++++ src/backend/catalog/system_views.sql | 3 +- src/include/access/xlogrecovery.h | 10 +++++++ src/include/catalog/pg_proc.dat | 6 ++-- src/test/regress/expected/rules.out | 5 ++-- 7 files changed, 78 insertions(+), 6 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index b3d53550688..08132f3970a 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -2057,6 +2057,42 @@ description | Waiting for a newly initialized WAL file to reach durable storage </entry> </row> + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>wal_read_source</structfield> <type>text</type> + </para> + <para> + Source from which the startup process most recently read WAL data. + Possible values are: + </para> + <itemizedlist> + <listitem> + <para> + <literal>archive</literal>: WAL restored using + <varname>restore_command</varname>. + </para> + </listitem> + <listitem> + <para> + <literal>pg_wal</literal>: WAL read from local + <filename>pg_wal</filename> directory. + </para> + </listitem> + <listitem> + <para> + <literal>stream</literal>: WAL actively being streamed from the + upstream server. + </para> + </listitem> + </itemizedlist> + <para> + NULL if no WAL has been read yet. Note that this reflects the + current read source, not the original delivery mechanism; streamed + WAL that is subsequently read from local files will show + <literal>pg_wal</literal>. + </para></entry> + </row> + </tbody> </tgroup> </table> diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index 80007783067..76e9fb00df3 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -789,6 +789,7 @@ pg_stat_get_recovery(PG_FUNCTION_ARGS) TimestampTz recovery_last_xact_time; TimestampTz current_chunk_start_time; RecoveryPauseState pause_state; + XLogSource wal_read_source; if (!RecoveryInProgress()) PG_RETURN_NULL(); @@ -807,6 +808,7 @@ pg_stat_get_recovery(PG_FUNCTION_ARGS) recovery_last_xact_time = XLogRecoveryCtl->recoveryLastXTime; current_chunk_start_time = XLogRecoveryCtl->currentChunkStartTime; pause_state = XLogRecoveryCtl->recoveryPauseState; + wal_read_source = XLogRecoveryCtl->lastReadSource; SpinLockRelease(&XLogRecoveryCtl->info_lck); if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE) @@ -854,5 +856,21 @@ pg_stat_get_recovery(PG_FUNCTION_ARGS) values[8] = CStringGetTextDatum(GetRecoveryPauseStateString(pause_state)); + switch (wal_read_source) + { + case XLOG_FROM_ANY: + nulls[8] = true; + break; + case XLOG_FROM_ARCHIVE: + values[8] = CStringGetTextDatum("archive"); + break; + case XLOG_FROM_PG_WAL: + values[8] = CStringGetTextDatum("pg_wal"); + break; + case XLOG_FROM_STREAM: + values[8] = CStringGetTextDatum("stream"); + break; + } + PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls))); } diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index 11915ab3de0..54b2979db8b 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -397,6 +397,7 @@ XLogRecoveryShmemInit(void) memset(XLogRecoveryCtl, 0, sizeof(XLogRecoveryCtlData)); SpinLockInit(&XLogRecoveryCtl->info_lck); + XLogRecoveryCtl->lastReadSource = XLOG_FROM_ANY; InitSharedLatch(&XLogRecoveryCtl->recoveryWakeupLatch); ConditionVariableInit(&XLogRecoveryCtl->recoveryNotPausedCV); } @@ -4249,6 +4250,11 @@ XLogFileRead(XLogSegNo segno, TimeLineID tli, if (source != XLOG_FROM_STREAM) XLogReceiptTime = GetCurrentTimestamp(); + /* Update shared memory for external visibility */ + SpinLockAcquire(&XLogRecoveryCtl->info_lck); + XLogRecoveryCtl->lastReadSource = source; + SpinLockRelease(&XLogRecoveryCtl->info_lck); + return fd; } if (errno != ENOENT || !notfoundOk) /* unexpected failure? */ diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 2eda7d80d02..6f52d54b1fa 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1008,7 +1008,8 @@ CREATE VIEW pg_stat_recovery AS s.replay_end_tli, s.recovery_last_xact_time, s.current_chunk_start_time, - s.pause_state + s.pause_state, + s.wal_read_source FROM pg_stat_get_recovery() s WHERE s.promote_triggered IS NOT NULL; diff --git a/src/include/access/xlogrecovery.h b/src/include/access/xlogrecovery.h index ba714a78280..04cffc33510 100644 --- a/src/include/access/xlogrecovery.h +++ b/src/include/access/xlogrecovery.h @@ -132,6 +132,16 @@ typedef struct XLogRecoveryCtlData RecoveryPauseState recoveryPauseState; ConditionVariable recoveryNotPausedCV; + /* + * Source from which the startup process most recently read WAL data: + * XLOG_FROM_ARCHIVE - retrieved via archive recovery + * XLOG_FROM_PG_WAL - read from pg_wal outside active streaming + * XLOG_FROM_STREAM - opened during active walreceiver streaming + * (I/O still reads a local pg_wal file) + * XLOG_FROM_ANY - initial/unknown before first successful read + */ + XLogSource lastReadSource; + slock_t info_lck; /* locks shared variables shown above */ } XLogRecoveryCtlData; diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 361e2cfffeb..ffad581e616 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5704,9 +5704,9 @@ { oid => '9949', descr => 'statistics: information about WAL recovery', proname => 'pg_stat_get_recovery', proisstrict => 'f', provolatile => 's', proparallel => 'r', prorettype => 'record', proargtypes => '', - proallargtypes => '{bool,pg_lsn,pg_lsn,int4,pg_lsn,int4,timestamptz,timestamptz,text}', - proargmodes => '{o,o,o,o,o,o,o,o,o}', - proargnames => '{promote_triggered,last_replayed_read_lsn,last_replayed_end_lsn,last_replayed_tli,replay_end_lsn,replay_end_tli,recovery_last_xact_time,current_chunk_start_time,pause_state}', + proallargtypes => '{bool,pg_lsn,pg_lsn,int4,pg_lsn,int4,timestamptz,timestamptz,text,text}', + proargmodes => '{o,o,o,o,o,o,o,o,o,o}', + proargnames => '{promote_triggered,last_replayed_read_lsn,last_replayed_end_lsn,last_replayed_tli,replay_end_lsn,replay_end_tli,recovery_last_xact_time,current_chunk_start_time,pause_state,wal_read_source}', prosrc => 'pg_stat_get_recovery' }, { oid => '6169', descr => 'statistics: information about replication slot', proname => 'pg_stat_get_replication_slot', provolatile => 's', diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index deb6e2ad6a9..e15fd98ed9f 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -2135,8 +2135,9 @@ pg_stat_recovery| SELECT promote_triggered, replay_end_tli, recovery_last_xact_time, current_chunk_start_time, - pause_state - FROM pg_stat_get_recovery() s(promote_triggered, last_replayed_read_lsn, last_replayed_end_lsn, last_replayed_tli, replay_end_lsn, replay_end_tli, recovery_last_xact_time, current_chunk_start_time, pause_state) + pause_state, + wal_read_source + FROM pg_stat_get_recovery() s(promote_triggered, last_replayed_read_lsn, last_replayed_end_lsn, last_replayed_tli, replay_end_lsn, replay_end_tli, recovery_last_xact_time, current_chunk_start_time, pause_state, wal_read_source) WHERE (promote_triggered IS NOT NULL); pg_stat_recovery_prefetch| SELECT stats_reset, prefetch, -- 2.51.0
