On Thu, Mar 9, 2023 at 2:43 PM Andres Freund <and...@anarazel.de> wrote:
> On 2023-03-09 06:51:31 -0600, Justin Pryzby wrote:
> > On Tue, Mar 07, 2023 at 10:18:44AM -0800, Andres Freund wrote:
> > There's a 2nd portion of the test that's still flapping, at least on
> > cirrusci.
> >
> > The issue that Tom mentioned is at:
> >  SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
> >
> > But what I've seen on cirrusci is at:
> >  SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
>
> Seems you meant to copy a different line for Tom's (s/writes/redas/)?
>
>
> > https://api.cirrus-ci.com/v1/artifact/task/6701069548388352/log/src/test/recovery/tmp_check/regression.diffs
>
> Hm. I guess the explanation here is that the buffers were already all written
> out by another backend. Which is made more likely by your patch.
>
>
> I found a few more occurances and chatted with Melanie. Melanie will come up
> with a fix I think.

So, what this test is relying on is that either the checkpointer or
another backend will flush the pages of test_io_shared which we dirtied
above in the test. The test specifically checks for IOCONTEXT_NORMAL
writes. It could fail if some other backend is doing a bulkread or
bulkwrite and flushes these buffers first in a strategy context.
This will happen more often when shared buffers is small.

I tried to come up with a reliable test which was limited to
IOCONTEXT_NORMAL. I thought if we could guarantee a dirty buffer would
be pinned using a cursor, that we could then issue a checkpoint and
guarantee a flush that way. However, I don't see a way to guarantee that
no one flushes the buffer between dirtying it and pinning it with the
cursor.

So, I think our best bet is to just change the test to pass if there are
any writes in any contexts. By moving the sum(writes) before the INSERT
and keeping the checkpoint, we can guarantee that someway or another,
some buffers will be flushed. This essentially covers the same code anyway.

Patch attached.

- Melanie
From 0d2904f2cf3b6cbf016e5701aaa2bc6997b505cc Mon Sep 17 00:00:00 2001
From: Melanie Plageman <melanieplage...@gmail.com>
Date: Fri, 10 Mar 2023 14:26:37 -0500
Subject: [PATCH v1] Stabilize pg_stat_io writes test

Counting writes only for io_context = 'normal' is unreliable, as
backends using a buffer access strategy could flush all of the dirty
buffers out from under the other backends and checkpointer. Change the
test to count writes in any context. This achieves roughly the same
coverage anyway.

Reported-by: Justin Pryzby <pry...@telsasoft.com>
Discussion: https://www.postgresql.org/message-id/ZAnWU8WbXEDjrfUE%40telsasoft.com
---
 src/test/regress/expected/stats.out | 8 ++++----
 src/test/regress/sql/stats.sql      | 9 ++++-----
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 186c296299..e90940f676 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1137,6 +1137,9 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- extends.
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_io
+  WHERE io_object = 'relation' \gset io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -1155,15 +1158,12 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
 -- and fsyncs.
-SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
-  FROM pg_stat_io
-  WHERE io_context = 'normal' AND io_object = 'relation' \gset io_sum_shared_before_
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
-  WHERE io_context = 'normal' AND io_object = 'relation' \gset io_sum_shared_after_
+  WHERE io_object = 'relation' \gset io_sum_shared_after_
 SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
  ?column? 
 ----------
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index d7f873cfc9..b94410e49e 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -549,6 +549,9 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- extends.
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_io
+  WHERE io_object = 'relation' \gset io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -558,16 +561,12 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
 -- and fsyncs.
-SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
-  FROM pg_stat_io
-  WHERE io_context = 'normal' AND io_object = 'relation' \gset io_sum_shared_before_
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
-  WHERE io_context = 'normal' AND io_object = 'relation' \gset io_sum_shared_after_
-
+  WHERE io_object = 'relation' \gset io_sum_shared_after_
 SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
 SELECT current_setting('fsync') = 'off'
   OR :io_sum_shared_after_fsyncs > :io_sum_shared_before_fsyncs;
-- 
2.37.2

Reply via email to