On Fri, Apr 2, 2021 at 9:46 AM Michael Paquier <mich...@paquier.xyz> wrote:
> Okay, applied and back-patched down to 12 then. Thank you both. Unfortunately and surprisingly, the test still fails (perhaps even rarer, once in several hundred runs) under multimaster. After scratching the head for some more time, it seems to me the following happens: not only vacuum encounters locked page, but also there exist a concurrent backend (as the parallel schedule is run) who holds back oldestxmin keeping it less than xid of transaction which did the insertion INSERT INTO reloptions_test VALUES (1, NULL), (NULL, NULL); FreezeLimit can't be higher than oldestxmin, so lazy_check_needs_freeze decides there is nothing to freeze on the page. multimaster commits are quite heavy, which apparently shifts the timings making the issue more likely. Currently we are testing the rather funny attached patch which forces all such old-snapshot-holders to finish. It is crutchy, but I doubt we want to change vacuum logic (e.g. checking tuple liveness in lazy_check_needs_freeze) due to this issue. (it is especially crutchy in xid::bigint casts, but wraparound is hardly expected in regression tests run). -- cheers, arseny
diff --git a/src/test/regress/expected/reloptions.out b/src/test/regress/expected/reloptions.out index bb7bd6e1e7e..78bbf4a5255 100644 --- a/src/test/regress/expected/reloptions.out +++ b/src/test/regress/expected/reloptions.out @@ -128,6 +128,20 @@ SELECT reloptions FROM pg_class WHERE oid = 'reloptions_test'::regclass; INSERT INTO reloptions_test VALUES (1, NULL), (NULL, NULL); ERROR: null value in column "i" of relation "reloptions_test" violates not-null constraint DETAIL: Failing row contains (null, null). +do $$ +declare + my_xid bigint; + oldest_xmin bigint; +begin + my_xid := txid_current(); + while true loop + oldest_xmin := min(backend_xmin::text::bigint) from pg_stat_activity where pid != pg_backend_pid(); + exit when oldest_xmin is null or oldest_xmin >= my_xid; + perform pg_sleep(0.1); + perform pg_stat_clear_snapshot(); + end loop; +end +$$; -- Do an aggressive vacuum to prevent page-skipping. VACUUM FREEZE reloptions_test; SELECT pg_relation_size('reloptions_test') = 0; diff --git a/src/test/regress/sql/reloptions.sql b/src/test/regress/sql/reloptions.sql index 95f7ab4189e..96fb59d16ad 100644 --- a/src/test/regress/sql/reloptions.sql +++ b/src/test/regress/sql/reloptions.sql @@ -72,6 +72,20 @@ SELECT reloptions FROM pg_class WHERE oid = ALTER TABLE reloptions_test RESET (vacuum_truncate); SELECT reloptions FROM pg_class WHERE oid = 'reloptions_test'::regclass; INSERT INTO reloptions_test VALUES (1, NULL), (NULL, NULL); +do $$ +declare + my_xid bigint; + oldest_xmin bigint; +begin + my_xid := txid_current(); + while true loop + oldest_xmin := min(backend_xmin::text::bigint) from pg_stat_activity where pid != pg_backend_pid(); + exit when oldest_xmin is null or oldest_xmin >= my_xid; + perform pg_sleep(0.1); + perform pg_stat_clear_snapshot(); + end loop; +end +$$; -- Do an aggressive vacuum to prevent page-skipping. VACUUM FREEZE reloptions_test; SELECT pg_relation_size('reloptions_test') = 0;