Hi, On 2022-12-08 16:15:11 -0800, Andres Freund wrote: > The most frequent case is postgres_fdw, which somewhat regularly fails with a > regression.diff like this: > > diff -U3 /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out > /tmp/cirrus-ci-build/build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out > --- /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out > 2022-12-08 20:35:24.772888000 +0000 > +++ > /tmp/cirrus-ci-build/build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out > 2022-12-08 20:43:38.199450000 +0000 > @@ -9911,8 +9911,7 @@ > WHERE application_name = 'fdw_retry_check'; > pg_terminate_backend > ---------------------- > - t > -(1 row) > +(0 rows) > > -- This query should detect the broken connection when starting new remote > -- transaction, reestablish new connection, and then succeed. > > > See e.g. > https://cirrus-ci.com/task/5925540020879360 > https://api.cirrus-ci.com/v1/artifact/task/5925540020879360/testrun/build/testrun/postgres_fdw-running/regress/regression.diffs > https://api.cirrus-ci.com/v1/artifact/task/5925540020879360/testrun/build/testrun/runningcheck.log > > > The following comment in the test provides a hint what might be happening: > > -- If debug_discard_caches is active, it results in > -- dropping remote connections after every transaction, making it > -- impossible to test termination meaningfully. So turn that off > -- for this test. > SET debug_discard_caches = 0; > > > I guess that a cache reset message arrives and leads to the connection being > terminated. Unfortunately that's hard to see right now, as the relevant log > messages are output with DEBUG3 - it's quite verbose, so enabling it for all > tests will be painful.
Downthread I reported that I was able to pinpoint that the source of the issue indeed is a cache inval message arriving in the wrong moment. We've had trouble with this test for years by now. We added workarounds, like commit 1273a15bf91fa322915e32d3b6dc6ec916397268 Author: Tom Lane <t...@sss.pgh.pa.us> Date: 2021-05-04 13:36:26 -0400 Disable cache clobber to avoid breaking postgres_fdw termination test. But that didn't suffice to make it reliable. Not entirely surprising, given there are cache resource sources other than clobber cache. Unless somebody comes up with a way to make the test more reliable pretty soon, I think we should just remove it. It's one of the most frequently flapping tests at the moment. Greetings, Andres Freund