Re: Possible fails in pg_stat_statements test
On 06.07.2022 20:11, Robert Haas wrote: On Thu, Mar 31, 2022 at 12:00 PM Julien Rouhaud wrote: Indeed. Then there is a very simple solution for this particular case as wal_records counter may only sometime becomes greater but never less. The corresponding patch is attached. +1 for this approach, and the patch looks good to me. Committed. Thanks a lot! -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Possible fails in pg_stat_statements test
On Thu, Mar 31, 2022 at 12:00 PM Julien Rouhaud wrote: > > Indeed. Then there is a very simple solution for this particular case as > > wal_records counter may only sometime becomes greater but never less. > > The corresponding patch is attached. > > +1 for this approach, and the patch looks good to me. Committed. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Possible fails in pg_stat_statements test
Hi, On Thu, Mar 31, 2022 at 06:08:01PM +0300, Anton A. Melnikov wrote: > Hello! > > On 30.03.2022 22:36, Robert Haas wrote: > > I don't think that the idea of "extra" WAL records is very principled. > > It's pretty vague what "extra" means, and your definition seems to be > > basically "whatever would be needed to make this test case pass." I agree, and even it there was a better definition there probably isn't much to learn from it. > I > > think the problem is basically with the test cases's idea that # of > > WAL records and # of table rows ought to be equal. I think that's just > > false. In general, we'd also have to worry about index insertions, > > which would provoke variable numbers of WAL records depending on > > whether they cause a page split. And we'd have to worry about TOAST > > table insertions, which could produce different numbers of records > > depending on the size of the data, the configured block size and TOAST > > threshold, and whether the TOAST table index incurs a page split. Indeed, we added this test as it was hitting only a few queries with small rows, which we thought would be stable, but that's apparently not the case. I think the reason we never had any problem is that the buildfarm currently doesn't run pg_stat_statement regression test, as it's marked as NO_INSTALLCHECK. Other CI systems like at pgpro evidently have a different approach. > > If it's true that this test case sometimes randomly fails, then we > > ought to fix that somehow, maybe by just removing this particular > > check from the test case, or changing it to >=, or something like > > that. But I don't think adding a new counter is the right idea. > > Indeed. Then there is a very simple solution for this particular case as > wal_records counter may only sometime becomes greater but never less. > The corresponding patch is attached. +1 for this approach, and the patch looks good to me.
Re: Possible fails in pg_stat_statements test
Hello! On 30.03.2022 22:36, Robert Haas wrote: I don't think that the idea of "extra" WAL records is very principled. It's pretty vague what "extra" means, and your definition seems to be basically "whatever would be needed to make this test case pass." I think the problem is basically with the test cases's idea that # of WAL records and # of table rows ought to be equal. I think that's just false. In general, we'd also have to worry about index insertions, which would provoke variable numbers of WAL records depending on whether they cause a page split. And we'd have to worry about TOAST table insertions, which could produce different numbers of records depending on the size of the data, the configured block size and TOAST threshold, and whether the TOAST table index incurs a page split. Thank you very much for this information. I really didn't take it into account. If it's true that this test case sometimes randomly fails, then we ought to fix that somehow, maybe by just removing this particular check from the test case, or changing it to >=, or something like that. But I don't think adding a new counter is the right idea. Indeed. Then there is a very simple solution for this particular case as wal_records counter may only sometime becomes greater but never less. The corresponding patch is attached. With best regards, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Companycommit 742d16413ebfe4f556e0f676a3a8785b638d245a Author: Anton A. Melnikov Date: Thu Mar 31 18:00:07 2022 +0300 Fix possible fails in pg_stat_statements test via test rework. diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index e0abe34bb6..8706409739 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -234,18 +234,18 @@ SET pg_stat_statements.track_utility = FALSE; SELECT query, calls, rows, wal_bytes > 0 as wal_bytes_generated, wal_records > 0 as wal_records_generated, -wal_records = rows as wal_records_as_rows +wal_records >= rows as wal_records_ge_rows FROM pg_stat_statements ORDER BY query COLLATE "C"; - query | calls | rows | wal_bytes_generated | wal_records_generated | wal_records_as_rows + query | calls | rows | wal_bytes_generated | wal_records_generated | wal_records_ge_rows ---+---+--+-+---+- DELETE FROM pgss_test WHERE a > $1| 1 |1 | t | t | t - DROP TABLE pgss_test | 1 |0 | t | t | f + DROP TABLE pgss_test | 1 |0 | t | t | t INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | t | t | t SELECT pg_stat_statements_reset() | 1 |1 | f | f | f SELECT query, calls, rows, +| 0 |0 | f | f | t wal_bytes > $1 as wal_bytes_generated, +| | | | | wal_records > $2 as wal_records_generated, +| | | | | - wal_records = rows as wal_records_as_rows+| | | | | + wal_records >= rows as wal_records_ge_rows +| | | | | FROM pg_stat_statements ORDER BY query COLLATE "C"| | | | | SET pg_stat_statements.track_utility = FALSE | 1 |0 | f | f | t UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 |3 | t | t | t diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql index dffd2c8c18..a01f183727 100644 --- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql +++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql @@ -122,7 +122,7 @@ SET pg_stat_statements.track_utility = FALSE; SELECT query, calls, rows, wal_bytes > 0 as wal_bytes_generated, wal_records > 0 as wal_records_generated, -wal_records = rows as wal_records_as_rows +wal_records >= rows as wal_records_ge_rows FROM pg_stat_statements ORDER BY query COLLATE "C"; --
Re: Possible fails in pg_stat_statements test
On Wed, Mar 30, 2022 at 2:20 AM Anton A. Melnikov wrote: > > Can the test failures be encountered without such an elaborate setup? If > > not, > > then I don't really see why we need to do anything here? > > There was a real bug report from our test department. They do long time > repetitive tests and sometimes met this failure. > So i suppose there is a non-zero probability that such error can occur > in the one-shot test as well. > The sequence given in the first letter helps to catch this failure quickly. I don't think that the idea of "extra" WAL records is very principled. It's pretty vague what "extra" means, and your definition seems to be basically "whatever would be needed to make this test case pass." I think the problem is basically with the test cases's idea that # of WAL records and # of table rows ought to be equal. I think that's just false. In general, we'd also have to worry about index insertions, which would provoke variable numbers of WAL records depending on whether they cause a page split. And we'd have to worry about TOAST table insertions, which could produce different numbers of records depending on the size of the data, the configured block size and TOAST threshold, and whether the TOAST table index incurs a page split. So even if we added a mechanism like what you propose here, we would only be fixing this particular test case, not creating infrastructure of any general utility. If it's true that this test case sometimes randomly fails, then we ought to fix that somehow, maybe by just removing this particular check from the test case, or changing it to >=, or something like that. But I don't think adding a new counter is the right idea. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Possible fails in pg_stat_statements test
Hello, thank you much for your attention and for your thought. On 20.03.2022 20:36, Andres Freund wrote: This patch introduces an additional counter of wal records not related to the query being executed. They're not unrelated though. Yes, i've missformulated here. Indeed there is a relation but it seems of the some other kind. It would be nice to clarify the terminology. Maybe divide WAL records into two kinds: 1) WAL records, the number of which depends on the given query itself. (say strong relation) 2) WAL records, the number of which depends on the given query and on the previous query history. (say weak relation) So modified in the patch wal_records counter will belongs to the first kind while the number of wal records due to on-access pruning and new clog page generation to the second. -many. For read-only queries the generated WAL due to on-access pruning can be a significant factor in performance. Removing that information makes pg_stat_statments *less* useful. A separate counter for the second type of records, say, extra_wal_records, will not only remove this disadvantage, but on the contrary will provide additional information. The next version of the patch with additional counter is attached. Really, now it is clearly seen that sometimes WAL due to on-access pruning can be a significant factor ! After pgbench -c10 -t300: postgres=# SELECT substring(query for 30), wal_records, extra_wal_records FROM pg_stat_statements WHERE extra_wal_records != 0; substring| wal_records | extra_wal_records +-+--- UPDATE pgbench_tellers SET tba |4557 |15 create table pgbench_history(t | 48 | 1 create table pgbench_branches( | 40 | 1 UPDATE pgbench_accounts SET ab |5868 | 1567 drop table if exists pgbench_a | 94 | 1 UPDATE pgbench_branches SET bb |5993 |14 SELECT abalance FROM pgbench_a | 0 | 7 (7 rows) Can the test failures be encountered without such an elaborate setup? If not, then I don't really see why we need to do anything here? There was a real bug report from our test department. They do long time repetitive tests and sometimes met this failure. So i suppose there is a non-zero probability that such error can occur in the one-shot test as well. The sequence given in the first letter helps to catch this failure quickly. With best regards, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Companycommit 2cc4234754fc815528ed299b64c1a3f1e1991617 Author: Anton A. Melnikov Date: Wed Mar 30 08:54:51 2022 +0300 Fix possible fails in pg_stat_statements test via taking into account WAL records due to on-access pruning and new clog page generation. diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile index 7fabd96f38..145b2617d7 100644 --- a/contrib/pg_stat_statements/Makefile +++ b/contrib/pg_stat_statements/Makefile @@ -6,11 +6,12 @@ OBJS = \ pg_stat_statements.o EXTENSION = pg_stat_statements -DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.8--1.9.sql \ - pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \ - pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \ - pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \ - pg_stat_statements--1.1--1.2.sql pg_stat_statements--1.0--1.1.sql +DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.9--1.10.sql \ + pg_stat_statements--1.8--1.9.sql pg_stat_statements--1.7--1.8.sql \ + pg_stat_statements--1.6--1.7.sql pg_stat_statements--1.5--1.6.sql \ + pg_stat_statements--1.4--1.5.sql pg_stat_statements--1.3--1.4.sql \ + pg_stat_statements--1.2--1.3.sql pg_stat_statements--1.1--1.2.sql \ + pg_stat_statements--1.0--1.1.sql PGFILEDESC = "pg_stat_statements - execution statistics of SQL statements" LDFLAGS_SL += $(filter -lm, $(LIBS)) diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index e0abe34bb6..9a375d796f 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -234,21 +234,23 @@ SET pg_stat_statements.track_utility = FALSE; SELECT query, calls, rows, wal_bytes > 0 as wal_bytes_generated, wal_records > 0 as wal_records_generated, -wal_records = rows as wal_records_as_rows +wal_records = rows as wal_records_as_rows, +extra_wal_records IS NOT NULL as extra_wal_records_supported FROM pg_stat_statements ORDER BY query COLLATE "C"; - query | calls | rows | wal_bytes_generated | wal_records_gener
Re: Possible fails in pg_stat_statements test
Hi, On 2022-01-14 11:11:07 +0300, Anton A. Melnikov wrote: > This patch introduces an additional counter of wal records not related to > the query being executed. They're not unrelated though. > Due to this counter pg_stat_statement finds out the number of wal records > that are not relevant to the query and does not include them in the per > query statistic. -many. For read-only queries the generated WAL due to on-access pruning can be a significant factor in performance. Removing that information makes pg_stat_statments *less* useful. > This removes the possibility of the error described above. > > There is a way to reproduce this error when patch is not applied: > 1) start server with "shared_preload_libraries = 'pg_stat_statements'" > string in the postgresql.conf; > 2) replace makefile in contrib/pg_stat_statements with attached one; > 3) replace test file contrib/pg_stat_statements/sql/pg_stat_statements.sql > and expected results > contrib/pg_stat_statements/expected/pg_stat_statements.out > with shorter versions from attached files; > 4) copy test.sh to contrib/pg_stat_statements and make sure that PGHOME > point to your server; > 5) cd to contrib/pg_stat_statements and execute: > export ITER=1 && while ./start.sh || break; export ITER=$(($ITER+1)); do :; > done > > Usually 100-200 iterations will be enough. > To catch the error more faster one can add wal_records column to SELECT > in line 26 of contrib/pg_stat_statements/sql/pg_stat_statements.sql as > followes: > SELECT query, calls, rows, wal_records, > and replace the contrib/pg_stat_statements/expected/pg_stat_statements.out > with attached pg_stat_statements-fast.out Can the test failures be encountered without such an elaborate setup? If not, then I don't really see why we need to do anything here? Greetings, Andres Freund
Re: Possible fails in pg_stat_statements test
Hello! Here is the second version of the patch rebased onto the current master. No logical changes. All other attached files from previous letter are actual. With best regards, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Companycommit 04ce779eb25fec3364c216202b7d7dbd3ed79819 Author: Anton A. Melnikov Date: Sun Mar 20 19:34:58 2022 +0300 Fix possible fails in pg_stat_statements test via taking into account non-query wal records. diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 9e525a6ad3..56ed7e0fde 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -1370,7 +1370,7 @@ pgss_store(const char *query, uint64 queryId, e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time); e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time); e->counters.usage += USAGE_EXEC(total_time); - e->counters.wal_records += walusage->wal_records; + e->counters.wal_records += (walusage->wal_records - walusage->non_query_wal_recs); e->counters.wal_fpi += walusage->wal_fpi; e->counters.wal_bytes += walusage->wal_bytes; diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c index 4656f1b3db..f09abba04e 100644 --- a/src/backend/access/heap/pruneheap.c +++ b/src/backend/access/heap/pruneheap.c @@ -21,6 +21,7 @@ #include "access/xlog.h" #include "access/xloginsert.h" #include "catalog/catalog.h" +#include "executor/instrument.h" #include "miscadmin.h" #include "pgstat.h" #include "storage/bufmgr.h" @@ -209,6 +210,11 @@ heap_page_prune_opt(Relation relation, Buffer buffer) ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin, limited_ts, , NULL); + /* Take into account that heap_page_prune() just generated a new + * wal record with zero xl_xid that is not related to current query. + */ + pgWalUsage.non_query_wal_recs++; + /* * Report the number of tuples reclaimed to pgstats. This is * ndeleted minus the number of newly-LP_DEAD-set items. diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index 3d9088a704..60bc6c7542 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -38,6 +38,7 @@ #include "access/xlog.h" #include "access/xloginsert.h" #include "access/xlogutils.h" +#include "executor/instrument.h" #include "miscadmin.h" #include "pg_trace.h" #include "pgstat.h" @@ -956,6 +957,12 @@ WriteZeroPageXlogRec(int pageno) XLogBeginInsert(); XLogRegisterData((char *) (), sizeof(int)); (void) XLogInsert(RM_CLOG_ID, CLOG_ZEROPAGE); + + /* + * Consider that a new unrelated to current query wal record + * with zero xl_xid has just been created. + */ + pgWalUsage.non_query_wal_recs++; } /* diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c index c5ff02a842..214fb3cc45 100644 --- a/src/backend/executor/instrument.c +++ b/src/backend/executor/instrument.c @@ -268,6 +268,7 @@ WalUsageAdd(WalUsage *dst, WalUsage *add) dst->wal_bytes += add->wal_bytes; dst->wal_records += add->wal_records; dst->wal_fpi += add->wal_fpi; + dst->non_query_wal_recs += add->non_query_wal_recs; } void @@ -276,4 +277,5 @@ WalUsageAccumDiff(WalUsage *dst, const WalUsage *add, const WalUsage *sub) dst->wal_bytes += add->wal_bytes - sub->wal_bytes; dst->wal_records += add->wal_records - sub->wal_records; dst->wal_fpi += add->wal_fpi - sub->wal_fpi; + dst->non_query_wal_recs += add->non_query_wal_recs - sub->non_query_wal_recs; } diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h index 1b7157bdd1..0d83f37a3c 100644 --- a/src/include/executor/instrument.h +++ b/src/include/executor/instrument.h @@ -49,6 +49,11 @@ typedef struct WalUsage int64 wal_records; /* # of WAL records produced */ int64 wal_fpi; /* # of WAL full page images produced */ uint64 wal_bytes; /* size of WAL records produced */ + /* + * Number of WAL records unrelated to current query. In particular due to + * heap_page_prune_opt() or WriteZeroPageXlogRec(). + */ + int64 non_query_wal_recs; } WalUsage; /* Flag bits included in InstrAlloc's instrument_options bitmask */
Possible fails in pg_stat_statements test
Hello, There are some places in the pg_state_statement's regress test where the bool result of comparison between the number of rows obtained and wal_records generated by query should be displayed. Now counting the number of wal_records for some query in pg_state_statement is done by the global pgWalUsage.wal_records counter difference calculation. During query execution the extra wal_records may appear that are not relate to the query. There are two reasons why this might happen: 1) Owing to the hit into pruning of some page in optional pruning function (heap_page_prune_opt()). 2) When a new page is required for a new xid in clog and WriteZeroPageXlogRec() was called. In both cases an extra wal record with zero xl_xid is generated, so wal_records counter gives an incremented value for this query and pg_stat_statement test will fail. This patch introduces an additional counter of wal records not related to the query being executed. Due to this counter pg_stat_statement finds out the number of wal records that are not relevant to the query and does not include them in the per query statistic. This removes the possibility of the error described above. There is a way to reproduce this error when patch is not applied: 1) start server with "shared_preload_libraries = 'pg_stat_statements'" string in the postgresql.conf; 2) replace makefile in contrib/pg_stat_statements with attached one; 3) replace test file contrib/pg_stat_statements/sql/pg_stat_statements.sql and expected results contrib/pg_stat_statements/expected/pg_stat_statements.out with shorter versions from attached files; 4) copy test.sh to contrib/pg_stat_statements and make sure that PGHOME point to your server; 5) cd to contrib/pg_stat_statements and execute: export ITER=1 && while ./start.sh || break; export ITER=$(($ITER+1)); do :; done Usually 100-200 iterations will be enough. To catch the error more faster one can add wal_records column to SELECT in line 26 of contrib/pg_stat_statements/sql/pg_stat_statements.sql as followes: SELECT query, calls, rows, wal_records, and replace the contrib/pg_stat_statements/expected/pg_stat_statements.out with attached pg_stat_statements-fast.out With best regards, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company commit 3f4659a8d8a390bb24fbc6f82a6add7949fbebe2 Author: Anton A. Melnikov Date: Fri Jan 14 10:54:35 2022 +0300 Fix possible fails in pg_stat_statements test via taking into account non-query wal records. diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 082bfa8f77..bd437aefc3 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -1370,7 +1370,7 @@ pgss_store(const char *query, uint64 queryId, e->counters.blk_read_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_read_time); e->counters.blk_write_time += INSTR_TIME_GET_MILLISEC(bufusage->blk_write_time); e->counters.usage += USAGE_EXEC(total_time); - e->counters.wal_records += walusage->wal_records; + e->counters.wal_records += (walusage->wal_records - walusage->non_query_wal_recs); e->counters.wal_fpi += walusage->wal_fpi; e->counters.wal_bytes += walusage->wal_bytes; diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c index 3201fcc52b..41f17ab97c 100644 --- a/src/backend/access/heap/pruneheap.c +++ b/src/backend/access/heap/pruneheap.c @@ -20,6 +20,7 @@ #include "access/transam.h" #include "access/xlog.h" #include "catalog/catalog.h" +#include "executor/instrument.h" #include "miscadmin.h" #include "pgstat.h" #include "storage/bufmgr.h" @@ -208,6 +209,11 @@ heap_page_prune_opt(Relation relation, Buffer buffer) ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin, limited_ts, , NULL); + /* Take into account that heap_page_prune() just generated a new + * wal record with zero xl_xid that is not related to current query. + */ + pgWalUsage.non_query_wal_recs++; + /* * Report the number of tuples reclaimed to pgstats. This is * ndeleted minus the number of newly-LP_DEAD-set items. diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index de787c3d37..e944fc3b1a 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -38,6 +38,7 @@ #include "access/xlog.h" #include "access/xloginsert.h" #include "access/xlogutils.h" +#include "executor/instrument.h" #include "miscadmin.h" #include "pg_trace.h" #include "pgstat.h" @@ -955,6 +956,12 @@ WriteZeroPageXlogRec(int pageno) XLogBeginInsert(); XLogRegisterData((char *) (), sizeof(int)); (void) XLogInsert(RM_CLOG_I