On Thu, Mar 9, 2023 at 2:43 PM Andres Freund <and...@anarazel.de> wrote: > On 2023-03-09 06:51:31 -0600, Justin Pryzby wrote: > > On Tue, Mar 07, 2023 at 10:18:44AM -0800, Andres Freund wrote: > > There's a 2nd portion of the test that's still flapping, at least on > > cirrusci. > > > > The issue that Tom mentioned is at: > > SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes; > > > > But what I've seen on cirrusci is at: > > SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes; > > Seems you meant to copy a different line for Tom's (s/writes/redas/)? > > > > https://api.cirrus-ci.com/v1/artifact/task/6701069548388352/log/src/test/recovery/tmp_check/regression.diffs > > Hm. I guess the explanation here is that the buffers were already all written > out by another backend. Which is made more likely by your patch. > > > I found a few more occurances and chatted with Melanie. Melanie will come up > with a fix I think.
So, what this test is relying on is that either the checkpointer or another backend will flush the pages of test_io_shared which we dirtied above in the test. The test specifically checks for IOCONTEXT_NORMAL writes. It could fail if some other backend is doing a bulkread or bulkwrite and flushes these buffers first in a strategy context. This will happen more often when shared buffers is small. I tried to come up with a reliable test which was limited to IOCONTEXT_NORMAL. I thought if we could guarantee a dirty buffer would be pinned using a cursor, that we could then issue a checkpoint and guarantee a flush that way. However, I don't see a way to guarantee that no one flushes the buffer between dirtying it and pinning it with the cursor. So, I think our best bet is to just change the test to pass if there are any writes in any contexts. By moving the sum(writes) before the INSERT and keeping the checkpoint, we can guarantee that someway or another, some buffers will be flushed. This essentially covers the same code anyway. Patch attached. - Melanie
From 0d2904f2cf3b6cbf016e5701aaa2bc6997b505cc Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Fri, 10 Mar 2023 14:26:37 -0500 Subject: [PATCH v1] Stabilize pg_stat_io writes test Counting writes only for io_context = 'normal' is unreliable, as backends using a buffer access strategy could flush all of the dirty buffers out from under the other backends and checkpointer. Change the test to count writes in any context. This achieves roughly the same coverage anyway. Reported-by: Justin Pryzby <pry...@telsasoft.com> Discussion: https://www.postgresql.org/message-id/ZAnWU8WbXEDjrfUE%40telsasoft.com --- src/test/regress/expected/stats.out | 8 ++++---- src/test/regress/sql/stats.sql | 9 ++++----- 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out index 186c296299..e90940f676 100644 --- a/src/test/regress/expected/stats.out +++ b/src/test/regress/expected/stats.out @@ -1137,6 +1137,9 @@ SELECT pg_stat_get_subscription_stats(NULL); -- extends. SELECT sum(extends) AS io_sum_shared_before_extends FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset +SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs + FROM pg_stat_io + WHERE io_object = 'relation' \gset io_sum_shared_before_ CREATE TABLE test_io_shared(a int); INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i; SELECT pg_stat_force_next_flush(); @@ -1155,15 +1158,12 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends; -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes -- and fsyncs. -SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs - FROM pg_stat_io - WHERE io_context = 'normal' AND io_object = 'relation' \gset io_sum_shared_before_ -- See comment above for rationale for two explicit CHECKPOINTs. CHECKPOINT; CHECKPOINT; SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs FROM pg_stat_io - WHERE io_context = 'normal' AND io_object = 'relation' \gset io_sum_shared_after_ + WHERE io_object = 'relation' \gset io_sum_shared_after_ SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes; ?column? ---------- diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql index d7f873cfc9..b94410e49e 100644 --- a/src/test/regress/sql/stats.sql +++ b/src/test/regress/sql/stats.sql @@ -549,6 +549,9 @@ SELECT pg_stat_get_subscription_stats(NULL); -- extends. SELECT sum(extends) AS io_sum_shared_before_extends FROM pg_stat_io WHERE io_context = 'normal' AND io_object = 'relation' \gset +SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs + FROM pg_stat_io + WHERE io_object = 'relation' \gset io_sum_shared_before_ CREATE TABLE test_io_shared(a int); INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i; SELECT pg_stat_force_next_flush(); @@ -558,16 +561,12 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends; -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes -- and fsyncs. -SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs - FROM pg_stat_io - WHERE io_context = 'normal' AND io_object = 'relation' \gset io_sum_shared_before_ -- See comment above for rationale for two explicit CHECKPOINTs. CHECKPOINT; CHECKPOINT; SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs FROM pg_stat_io - WHERE io_context = 'normal' AND io_object = 'relation' \gset io_sum_shared_after_ - + WHERE io_object = 'relation' \gset io_sum_shared_after_ SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes; SELECT current_setting('fsync') = 'off' OR :io_sum_shared_after_fsyncs > :io_sum_shared_before_fsyncs; -- 2.37.2