Hi,

Thanks for the review!

Current status of the patch is:
- 'WAL read' stats in xlogrecovery.c are added to pg_stat_io.
- IOCONTEXT_INIT is added to count 'WAL init'. 'WAL init' stats are
added to pg_stat_io.
- pg_stat_io shows different op_bytes for the IOOBJECT_WAL operations.
- Working on which 'BackendType / IOContext / IOOp' should be banned
in pg_stat_io.
- Working on adding 'WAL read' to the xlogreader.c and walsender.c.
- PendingWalStats.wal_sync and
PendingWalStats.wal_write_time/PendingWalStats.wal_sync_time are moved
to pgstat_count_io_op_n()/pgstat_count_io_op_time() respectively.

TODOs:
- Documentation.
- Thinking about how to interpret the different IOOps within the two
IOContexts and discussing what would be useful to count.
- Decide which 'BackendType / IOContext / IOOp' should not be tracked.
- Adding 'WAL read' to the xlogreader.c and walsender.c. (This could
be an another patch)
- Adding WAIT_EVENT_WAL_COPY_* operations to pg_stat_io if needed.
(This could be an another patch)

On Sat, 22 Jul 2023 at 01:30, Melanie Plageman
<melanieplage...@gmail.com> wrote:
> I think it would be good to count WAL reads even though they are not
> currently represented in pg_stat_wal. Here is a thread discussing this
> [1].

I used the same implementation in the thread link [1]. I added 'WAL
read' to only xlogrecovery.c for now. I didn't add 'WAL read' to
xlogreader.c and walsender.c because they cause some failures on:
'!pgStatLocal.shmem->is_shutdown' asserts. I will spend more time on
these. Also, I added Bharath to CC. I have a question about 'WAL
read':
1. There are two places where 'WAL read' happens.
a. In WALRead() in xlogreader.c, it reads 'count' bytes, most of the
time count is equal to XLOG_BLCKSZ but there are some cases it is not.
For example
- in XLogSendPhysical() in walsender.c WALRead() is called by nbytes
- in WALDumpReadPage() in pg_waldump.c WALRead() is called by count
These nbytes and count variables could be different from XLOG_BLCKSZ.

b. in XLogPageRead() in xlogreader.c, it reads exactly XLOG_BLCKSZ bytes:
pg_pread(readFile, readBuf, XLOG_BLCKSZ, (off_t) readOff);

So, what should op_bytes be set to for 'WAL read' operations?

> Eventually, the docs will need an update as well. You can wait until a
> later version of the patch to do this, but I would include it in a list
> of the remaining TODOs in your next version.

Done. I shared TODOs at the top.

> I think we will also want to add an IOContext for WAL initialization.
> Then we can track how long is spent doing 'WAL init' (including filling
> the WAL file with zeroes). XLogFileInitInternal() is likely where we
> would want to add it. And op_bytes for this would likely be
> wal_segment_size. I thought I heard about someone proposing adding WAL
> init to pg_stat_wal, but I can't find the thread.

Done. I created a new IOCONTEXT_INIT IOContext for the 'WAL init'. I
have a question there:
1. Some of the WAL processes happens at initdb (standalone backend
IOCONTEXT_NORMAL/(IOOP_READ & IOOP_WRITE) and
IOCONTEXT_INIT/(IOOP_WRITE & IOOP_FSYNC)). Since this happens at the
initdb, AFAIK there is no way to set 'track_wal_io_timing' and
'track_io_timing' variables there. So, their timings appear as 0.
Should I use IsBootstrapProcessingMode() to enable WAL io timings at
the initdb or are they not that much important?

> I think there is also an argument for counting WAL files recycled as
> IOOP_REUSES. We should start thinking about how to interpret the
> different IOOps within the two IOContexts and discussing what would be
> useful to count. For example, should removing a logfile count as an
> IOOP_EVICT? Maybe it is not directly related to "IO" enough or even an
> interesting statistic, but we should think about what kinds of
> IO-related WAL statistics we want to track.

I added that to TODOs.

> Any that we decide not to count for now should be "banned" in
> pgstat_tracks_io_op() for clarity. For example, if we create a separate
> IOContext for WAL file init, I'm not sure what would count as an
> IOOP_EXTEND in IOCONTEXT_NORMAL for IOOBJECT_WAL.
>
> Also, I think there are some backend types which will not generate WAL
> and we should determine which those are and skip those rows in
> pgstat_tracks_io_object().

I agree, I am working on this. I have a couple of questions:
1. Can client backend and background worker do IOCONTEXT_NORMAL/IOOP_READ?
2. Is there an easy way to check if 'BackendType / IOOBJECT_WAL' does
specific IOOp operations?

> diff --git a/src/backend/access/transam/xlog.c
> b/src/backend/access/transam/xlog.c
> index 8b0710abe6..2ee6c21398 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -2207,6 +2207,10 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID
> tli, bool flexible)
>
> I think we should likely follow the pattern of using
> pgstat_prepare_io_time() and pgstat_count_io_op_time() as it is done
> elsewhere. You could pass the IOObject as a parameter to
> pgstat_prepare_io_time() in order to determine if we should check
> track_io_timing or track_wal_io_timing. And we'll want to check
> track_wal_io_timing if IOObject is IOOBJECT_WAL in
> pgstat_count_io_op_time().

Done. Instead of passing parameters to pgstat_prepare_io_time(), I
used a slightly different implementation. I return the current time if
there is a chance that any 'time' can be tracked.

>                      INSTR_TIME_SET_CURRENT(duration);
>
> INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, duration,
> start);
>
> +                    pgstat_count_io_op_time(IOOBJECT_WAL,
> IOCONTEXT_NORMAL, IOOP_WRITE, start, 1);
> +                } else
> +                {
>
> Other users of pgstat_count_io_op_time()/io_op_n() which write multiple
> pages at a time pass the number of pages in as the cnt parameter. (see
> ExtendBufferedRelLocal() as an example). I think we want to do that for
> WAL also. In this case, it would be the local variable "npages" and we
> can do it outside of this loop.
>
> It is true that the existing WAL stats count wal_writes here. However,
> this is essentially counting write system calls, which is probably not
> what we want for pg_stat_io. See [2] for a discussion about whether to
> count blocks written back or writeback system calls for a previous
> pg_stat_io feature. All of the other block-based IO statistics in
> pg_stat_io count the number of blocks.
>
> This being said, we probably want to just leave
> PendingWalStats.wal_write++ here. We would normally move it into
> pg_stat_io like we have with pgBufferUsage and the db IO stats that are
> updated in pgstat_count_io_op_time(). This consolidation makes it easier
> to eventually reduce the duplication. However, in this case, it seems
> wal_write counts something we don't count in pg_stat_io, so it can
> probably be left here. I would still move the
> PendingWalStats.wal_write_time into pgstat_count_io_op_time(), since
> that seems like it is the same as what will be in pg_stat_io.

Done. I moved PendingWalStats.wal_sync and
PendingWalStats.wal_write_time/PendingWalStats.wal_sync_time to
pgstat_count_io_op_n()/pgstat_count_io_op_time() respectively. Because
of this change, pg_stat_wal's and pg_stat_io's
IOOBJECT_WAL/IOCONTEXT_NORMAL/IOOP_WRITE counts are different but the
rest are the same.

> Also, op_bytes for IOOBJECT_WAL/IOCONTEXT_NORMAL should be XLOG_BLCKSZ
> (see comment in pg_stat_get_io() in pgstatfuncs.c). Those default to the
> same value but can be made to be different.

Done.

> I would wrap this line and check other lines to make sure they are not
> too long.

Done.

>
> +    } else
> +    {
> +        pgstat_count_io_op_n(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_FSYNC, 1);
>      }
>
>      PendingWalStats.wal_sync++;
>
> Same feedback as above about using the prepare/count pattern used for
> pg_stat_io elsewhere. In this case, you should be able to move
> PendingWalStats.wal_sync into there as well.

Done.

> > There is a track_io_timing variable to control pg_stat_io
> > timings and a track_wal_io_timing variable to control WAL timings. I
> > couldn't decide on which logic to enable WAL timings on pg_stat_io.
> > For now, both pg_stat_io and track_wal_io_timing are needed to be
> > enabled to track WAL timings in pg_stat_io.
>
> Hmm. I could see a case where someone doesn't want to incur the
> overhead of track_io_timing for regular IO but does want to do so for
> WAL because they are interested in a specific issue. I'm not sure
> though. I could be convinced otherwise (based on relative overhead,
> etc).

Done. IOOBJECT_WAL uses track_wal_io_timing regardless of
track_io_timing for now.

> [1] 
> https://www.postgresql.org/message-id/flat/20230216191138.jotc73lqb7xhfqbi%40awork3.anarazel.de#eb4a641427fa1eb013e9ecdd8648e640
> [2] 
> https://www.postgresql.org/message-id/20230504165738.4e2hfoddoels542c%40awork3.anarazel.de

In addition to these, are WAIT_EVENT_WAL_COPY_* operations needed to
be added to pg_stat_io? If the answer is yes, should I add them to the
current patch?

Any kind of feedback would be appreciated.

Regards,
Nazir Bilal Yavuz
Microsoft
From 574fdec6ed8073dbc49053e6933db0310c7c62f5 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavu...@gmail.com>
Date: Thu, 3 Aug 2023 16:11:16 +0300
Subject: [PATCH v2] Show WAL stats on pg_stat_io

This patch aims to showing WAL stats per backend on pg_stat_io view.

With this patch, it can be seen how many WAL operations it makes, their
context, types and total timings per backend in pg_stat_io view.
---
 src/backend/access/transam/xlog.c         | 58 ++++++----------
 src/backend/access/transam/xlogrecovery.c | 17 +++++
 src/backend/utils/activity/pgstat_io.c    | 80 +++++++++++++++++++++--
 src/backend/utils/adt/pgstatfuncs.c       |  4 +-
 src/include/pgstat.h                      | 16 ++++-
 5 files changed, 131 insertions(+), 44 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 60c0b7ec3af..ee7b85e18ca 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2188,38 +2188,22 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 			Size		nbytes;
 			Size		nleft;
 			int			written;
-			instr_time	start;
+			instr_time	io_start;
 
 			/* OK to write the page(s) */
 			from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
 			nbytes = npages * (Size) XLOG_BLCKSZ;
 			nleft = nbytes;
+
+			io_start = pgstat_prepare_io_time();
 			do
 			{
 				errno = 0;
 
-				/* Measure I/O timing to write WAL data */
-				if (track_wal_io_timing)
-					INSTR_TIME_SET_CURRENT(start);
-				else
-					INSTR_TIME_SET_ZERO(start);
-
 				pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
 				written = pg_pwrite(openLogFile, from, nleft, startoffset);
 				pgstat_report_wait_end();
 
-				/*
-				 * Increment the I/O timing and the number of times WAL data
-				 * were written out to disk.
-				 */
-				if (track_wal_io_timing)
-				{
-					instr_time	duration;
-
-					INSTR_TIME_SET_CURRENT(duration);
-					INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, duration, start);
-				}
-
 				PendingWalStats.wal_write++;
 
 				if (written <= 0)
@@ -2245,6 +2229,9 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 				startoffset += written;
 			} while (nleft > 0);
 
+			pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL,
+									IOOP_WRITE, io_start, npages);
+
 			npages = 0;
 
 			/*
@@ -2938,6 +2925,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 	int			fd;
 	int			save_errno;
 	int			open_flags = O_RDWR | O_CREAT | O_EXCL | PG_BINARY;
+	instr_time	io_start;
 
 	Assert(logtli != 0);
 
@@ -2981,6 +2969,8 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 				(errcode_for_file_access(),
 				 errmsg("could not create file \"%s\": %m", tmppath)));
 
+	io_start = pgstat_prepare_io_time();
+
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_WRITE);
 	save_errno = 0;
 	if (wal_init_zero)
@@ -3016,6 +3006,9 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 	}
 	pgstat_report_wait_end();
 
+	pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_INIT, IOOP_WRITE,
+						io_start, 1);
+
 	if (save_errno)
 	{
 		/*
@@ -3032,6 +3025,8 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 				 errmsg("could not write to file \"%s\": %m", tmppath)));
 	}
 
+	io_start = pgstat_prepare_io_time();
+
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
 	if (pg_fsync(fd) != 0)
 	{
@@ -3044,6 +3039,9 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 	}
 	pgstat_report_wait_end();
 
+	pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_INIT,
+							IOOP_FSYNC, io_start, 1);
+
 	if (close(fd) != 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
@@ -8174,7 +8172,7 @@ void
 issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 {
 	char	   *msg = NULL;
-	instr_time	start;
+	instr_time	io_start;
 
 	Assert(tli != 0);
 
@@ -8187,11 +8185,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 		sync_method == SYNC_METHOD_OPEN_DSYNC)
 		return;
 
-	/* Measure I/O timing to sync the WAL file */
-	if (track_wal_io_timing)
-		INSTR_TIME_SET_CURRENT(start);
-	else
-		INSTR_TIME_SET_ZERO(start);
+	io_start = pgstat_prepare_io_time();
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
 	switch (sync_method)
@@ -8235,18 +8229,8 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 
 	pgstat_report_wait_end();
 
-	/*
-	 * Increment the I/O timing and the number of times WAL files were synced.
-	 */
-	if (track_wal_io_timing)
-	{
-		instr_time	duration;
-
-		INSTR_TIME_SET_CURRENT(duration);
-		INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, duration, start);
-	}
-
-	PendingWalStats.wal_sync++;
+	pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_FSYNC,
+							io_start, 1);
 }
 
 /*
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index becc2bda62e..ee850af5514 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1587,6 +1587,7 @@ PerformWalRecovery(void)
 	XLogRecord *record;
 	bool		reachedRecoveryTarget = false;
 	TimeLineID	replayTLI;
+	uint32		pgstat_report_wal_frequency = 0;
 
 	/*
 	 * Initialize shared variables for tracking progress of WAL replay, as if
@@ -1745,6 +1746,16 @@ PerformWalRecovery(void)
 			 */
 			ApplyWalRecord(xlogreader, record, &replayTLI);
 
+			/*
+			 * Report pending statistics to the cumulative stats system once
+			 * every PGSTAT_REPORT_FREQUENCY times to not hinder performance.
+			 */
+			if (pgstat_report_wal_frequency++ == PGSTAT_REPORT_FREQUENCY)
+			{
+				pgstat_report_wal(false);
+				pgstat_report_wal_frequency = 0;
+			}
+
 			/* Exit loop if we reached inclusive recovery target */
 			if (recoveryStopsAfter(xlogreader))
 			{
@@ -3220,6 +3231,7 @@ XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
 	uint32		targetPageOff;
 	XLogSegNo	targetSegNo PG_USED_FOR_ASSERTS_ONLY;
 	int			r;
+	instr_time	io_start;
 
 	XLByteToSeg(targetPagePtr, targetSegNo, wal_segment_size);
 	targetPageOff = XLogSegmentOffset(targetPagePtr, wal_segment_size);
@@ -3312,6 +3324,8 @@ retry:
 	/* Read the requested page */
 	readOff = targetPageOff;
 
+	io_start = pgstat_prepare_io_time();
+
 	pgstat_report_wait_start(WAIT_EVENT_WAL_READ);
 	r = pg_pread(readFile, readBuf, XLOG_BLCKSZ, (off_t) readOff);
 	if (r != XLOG_BLCKSZ)
@@ -3340,6 +3354,9 @@ retry:
 	}
 	pgstat_report_wait_end();
 
+	pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_READ,
+							io_start, 1);
+
 	Assert(targetSegNo == readSegNo);
 	Assert(targetPageOff == readOff);
 	Assert(reqLen <= readLen);
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index eb7d35d4225..97689a0637e 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -87,17 +87,25 @@ pgstat_count_io_op_n(IOObject io_object, IOContext io_context, IOOp io_op, uint3
 	Assert((unsigned int) io_op < IOOP_NUM_TYPES);
 	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
 
+	if(io_object == IOOBJECT_WAL && io_context == IOCONTEXT_NORMAL &&
+	   io_op == IOOP_FSYNC)
+		PendingWalStats.wal_sync += cnt;
+
 	PendingIOStats.counts[io_object][io_context][io_op] += cnt;
 
 	have_iostats = true;
 }
 
+/*
+ * Prepares io_time for pgstat_count_io_op_time() function. It needs to return
+ * current time if there is a chance that any 'time' can be tracked.
+ */
 instr_time
 pgstat_prepare_io_time(void)
 {
 	instr_time	io_start;
 
-	if (track_io_timing)
+	if(track_io_timing || track_wal_io_timing)
 		INSTR_TIME_SET_CURRENT(io_start);
 	else
 		INSTR_TIME_SET_ZERO(io_start);
@@ -105,6 +113,22 @@ pgstat_prepare_io_time(void)
 	return io_start;
 }
 
+/*
+ * Decide if the io timing needs be tracked
+ */
+bool
+pgstat_should_track_io_time(IOObject io_object, IOContext io_context)
+{
+	/*
+	 * io times of IOOBJECT_WAL IOObject needs to be tracked when
+	 * 'track_wal_io_timing' is set regardless of 'track_io_timing'.
+	 */
+	if (io_object == IOOBJECT_WAL)
+		return track_wal_io_timing;
+
+	return track_io_timing;
+}
+
 /*
  * Like pgstat_count_io_op_n() except it also accumulates time.
  */
@@ -112,7 +136,7 @@ void
 pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 						instr_time start_time, uint32 cnt)
 {
-	if (track_io_timing)
+	if (pgstat_should_track_io_time(io_object, io_context))
 	{
 		instr_time	io_time;
 
@@ -124,6 +148,9 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 			pgstat_count_buffer_write_time(INSTR_TIME_GET_MICROSEC(io_time));
 			if (io_object == IOOBJECT_RELATION)
 				INSTR_TIME_ADD(pgBufferUsage.blk_write_time, io_time);
+			/* Track IOOBJECT_WAL/IOCONTEXT_NORMAL times on PendingWalStats */
+			else if (io_object == IOOBJECT_WAL && io_context == IOCONTEXT_NORMAL)
+				INSTR_TIME_ADD(PendingWalStats.wal_write_time, io_time);
 		}
 		else if (io_op == IOOP_READ)
 		{
@@ -131,6 +158,12 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 			if (io_object == IOOBJECT_RELATION)
 				INSTR_TIME_ADD(pgBufferUsage.blk_read_time, io_time);
 		}
+		else if (io_op == IOOP_FSYNC)
+		{
+			/* Track IOOBJECT_WAL/IOCONTEXT_NORMAL times on PendingWalStats */
+			if (io_object == IOOBJECT_WAL && io_context == IOCONTEXT_NORMAL)
+				INSTR_TIME_ADD(PendingWalStats.wal_sync_time, io_time);
+		}
 
 		INSTR_TIME_ADD(PendingIOStats.pending_times[io_object][io_context][io_op],
 					   io_time);
@@ -216,12 +249,32 @@ pgstat_get_io_context_name(IOContext io_context)
 			return "normal";
 		case IOCONTEXT_VACUUM:
 			return "vacuum";
+		case IOCONTEXT_INIT:
+			return "init";
 	}
 
 	elog(ERROR, "unrecognized IOContext value: %d", io_context);
 	pg_unreachable();
 }
 
+/*
+ * op_bytes can change according to IOObject and IOContext.
+ * Return BLCKSZ as default.
+ */
+int
+pgstat_get_io_op_btyes(IOObject io_object, IOContext io_context)
+{
+	if (io_object == IOOBJECT_WAL)
+	{
+		if (io_context == IOCONTEXT_NORMAL)
+			return XLOG_BLCKSZ;
+		else if (io_context == IOCONTEXT_INIT)
+			return wal_segment_size;
+	}
+
+	return BLCKSZ;
+}
+
 const char *
 pgstat_get_io_object_name(IOObject io_object)
 {
@@ -231,6 +284,8 @@ pgstat_get_io_object_name(IOObject io_object)
 			return "relation";
 		case IOOBJECT_TEMP_RELATION:
 			return "temp relation";
+		case IOOBJECT_WAL:
+			return "wal";
 	}
 
 	elog(ERROR, "unrecognized IOObject value: %d", io_object);
@@ -312,10 +367,10 @@ pgstat_tracks_io_bktype(BackendType bktype)
 		case B_INVALID:
 		case B_ARCHIVER:
 		case B_LOGGER:
-		case B_WAL_RECEIVER:
-		case B_WAL_WRITER:
 			return false;
 
+		case B_WAL_RECEIVER:
+		case B_WAL_WRITER:
 		case B_AUTOVAC_LAUNCHER:
 		case B_AUTOVAC_WORKER:
 		case B_BACKEND:
@@ -350,6 +405,15 @@ pgstat_tracks_io_object(BackendType bktype, IOObject io_object,
 	if (!pgstat_tracks_io_bktype(bktype))
 		return false;
 
+	/*
+	 * Currently, IO on IOOBJECT_WAL IOObject can only occur in the
+	 * IOCONTEXT_NORMAL and IOCONTEXT_INIT IOContext.
+	 */
+	if (io_object == IOOBJECT_WAL &&
+		(io_context != IOCONTEXT_NORMAL &&
+	     io_context != IOCONTEXT_INIT))
+		return false;
+
 	/*
 	 * Currently, IO on temporary relations can only occur in the
 	 * IOCONTEXT_NORMAL IOContext.
@@ -439,6 +503,14 @@ pgstat_tracks_io_op(BackendType bktype, IOObject io_object,
 	if (io_context == IOCONTEXT_BULKREAD && io_op == IOOP_EXTEND)
 		return false;
 
+	if(io_object == IOOBJECT_WAL && io_context == IOCONTEXT_INIT &&
+	   !(io_op == IOOP_WRITE || io_op == IOOP_FSYNC))
+	   return false;
+
+	if(io_object == IOOBJECT_WAL && io_context == IOCONTEXT_NORMAL &&
+	   !(io_op == IOOP_WRITE || io_op == IOOP_READ || io_op == IOOP_FSYNC))
+	   return false;
+
 	strategy_io_context = io_context == IOCONTEXT_BULKREAD ||
 		io_context == IOCONTEXT_BULKWRITE || io_context == IOCONTEXT_VACUUM;
 
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2a4c8ef87ff..c467891e5fb 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1386,6 +1386,7 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 			for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
 			{
 				const char *context_name = pgstat_get_io_context_name(io_context);
+				int			op_bytes;
 
 				Datum		values[IO_NUM_COLUMNS] = {0};
 				bool		nulls[IO_NUM_COLUMNS] = {0};
@@ -1409,7 +1410,8 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 				 * and constant multipliers, once non-block-oriented IO (e.g.
 				 * temporary file IO) is tracked.
 				 */
-				values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
+				op_bytes = pgstat_get_io_op_btyes(io_obj, io_context);
+				values[IO_COL_CONVERSION] = Int64GetDatum(op_bytes);
 
 				for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
 				{
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 57a2c0866a2..04deb2aa509 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -277,9 +277,10 @@ typedef enum IOObject
 {
 	IOOBJECT_RELATION,
 	IOOBJECT_TEMP_RELATION,
+	IOOBJECT_WAL,
 } IOObject;
 
-#define IOOBJECT_NUM_TYPES (IOOBJECT_TEMP_RELATION + 1)
+#define IOOBJECT_NUM_TYPES (IOOBJECT_WAL + 1)
 
 typedef enum IOContext
 {
@@ -287,9 +288,10 @@ typedef enum IOContext
 	IOCONTEXT_BULKWRITE,
 	IOCONTEXT_NORMAL,
 	IOCONTEXT_VACUUM,
+	IOCONTEXT_INIT,
 } IOContext;
 
-#define IOCONTEXT_NUM_TYPES (IOCONTEXT_VACUUM + 1)
+#define IOCONTEXT_NUM_TYPES (IOCONTEXT_INIT + 1)
 
 typedef enum IOOp
 {
@@ -439,6 +441,12 @@ typedef struct PgStat_WalStats
 	TimestampTz stat_reset_timestamp;
 } PgStat_WalStats;
 
+/*
+ * When stats captured in resource-intensive/tight loops, report pending stats
+ * once every PGSTAT_REPORT_FREQUENCY times.
+ */
+#define PGSTAT_REPORT_FREQUENCY		1024
+
 /*
  * This struct stores wal-related durations as instr_time, which makes it
  * cheaper and easier to accumulate them, by not requiring type
@@ -521,10 +529,12 @@ extern bool pgstat_bktype_io_stats_valid(PgStat_BktypeIO *backend_io,
 extern void pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op);
 extern void pgstat_count_io_op_n(IOObject io_object, IOContext io_context, IOOp io_op, uint32 cnt);
 extern instr_time pgstat_prepare_io_time(void);
+extern bool pgstat_should_track_io_time(IOObject io_object, IOContext io_context);
 extern void pgstat_count_io_op_time(IOObject io_object, IOContext io_context,
 									IOOp io_op, instr_time start_time, uint32 cnt);
 
 extern PgStat_IO *pgstat_fetch_stat_io(void);
+extern int pgstat_get_io_op_btyes(IOObject io_object, IOContext io_context);
 extern const char *pgstat_get_io_context_name(IOContext io_context);
 extern const char *pgstat_get_io_object_name(IOObject io_object);
 
@@ -770,6 +780,8 @@ extern PGDLLIMPORT SessionEndType pgStatSessionEndCause;
 /*
  * Variables in pgstat_wal.c
  */
+extern PGDLLIMPORT bool track_wal_io_timing;
+extern PGDLLIMPORT int wal_segment_size;
 
 /* updated directly by backends and background processes */
 extern PGDLLIMPORT PgStat_PendingWalStats PendingWalStats;
-- 
2.40.1

Reply via email to